Compounds for targeting

ABSTRACT

Fusion compounds comprising a target cell-specific portion fused to an oligomeric rival nuclease are disclosed. The inventive compounds are useful as anti-cancer agents. Methods of preparation and use of the inventive compounds are disclosed.

The present invention relates to compounds, some of which may bedirectly or indirectly cytotoxic combinations of compounds, that have ahigh avidity for, and can be targeted to, selected cells.

BACKGROUND OF THE PRIOR ART

The cell-specific targeting of compounds which are directly, orindirectly, cytotoxic has been proposed as a way to combat diseases suchas cancer. Bagshawe and his co-workers have disclosed (Bagshawe (1987)Br. J. Cancer 56, 531; Bagshawe et al (1988) Br. J. Cancer 58, 700; WO88/07378) conjugated compounds comprising an antibody or part thereofand an enzyme, the antibody being specific to tumour cell antigens andthe enzyme acting to convert an innocuous pro-drug into a cytotoxiccompound. The cytotoxic compounds were alkylating agents. eg a benzoicacid mustard released from para-N-bis(2-chloroethyl)aminobenzoylglutamic acid by the action of Pseudomonas sp. CPG2 enzyme.

An alternative system using different pro-drugs has been disclosed (WO91/11201) by Epenetos and co-workers. The cytotoxic compounds werecyanogenic monosaccharides or disaccharides, such as the plant compoundamygdalin, which release cyanide upon the action of a β-glucosidase andhydroxynitrile lyase.

In a further alternative system, the use of antibody-enzyme conjugatescontaining the enzyme alkaline phosphatase in conjunction with thepro-drug etoposide 4'-phosphate or 7-(2'-aminoethyl phosphate)mitomycinor a combination thereof have been disclosed (EP 0 302 473; Senter et al(1988) Proc. Natl. Acad. Sci. USA 85, 4842).

Rybak and co-workers have disclosed (Rybak et al (1991) J. Biol. Chem.266, 21202; WO 91/16069) the cytotoxic potential of a monomericpancreatic ribonuclease when injected directly into Xenopus oocytes andthe cytotoxic potential of monomeric RNase coupled to human transferrinor antibodies directed against the transferrin receptor. The monomericRNase hybrid proteins were cytotoxic to human erythroleukaemia cells invitro.

Other approaches are the in vivo application of streptavidin conjugatedantibodies followed, after an appropriate period, by radioactive biotin(Hnatowich et al (1988) J. Nucl. Med. 29, 1428-1434), or injection of abiotinylated mAb followed by radioactive streptavidin (Paganelli et al(1990) Int. J. Cancer 45, 1184-1189). A pilot radioimmunolocalisationstudy in non-small cell lung carcinomas was conducted with encouragingresults (Kalofonos et al (1990) J. Nucl. Med. 31, 1791-1796).

Apart from these examples, it is rather more common to see biotinylatedantibodies and streptavidin-enzyme conjugates which are used inenzyme-linked immunosorbent assays.

These previous systems have used relatively large antibody-enzyme orantibody-streptavidin or antibody-biotin conjugates and may compriseportions of non-mammalian origin which are highly immunoreactive.

Rapid penetrance (Yokota et al (1992) Cancer Res. 52, 3402-3408) andrapid clearance (Colcher et al (1990) J. Natl. Cancer Inst. 82,1191-1197) has been demonstrated for single chain Fv antibody fragments(ScFv).

In using the cell-specific reagents aforementioned in a therapeuticallyuseful situation one of the requirements that needs to be met is for thecell-specific reagent to accumulate to a sufficiently higher level atthe target cell than at other cells. A further requirement is that adirectly or indirectly cytotoxic reagent is carried to the target cell,and it is preferred that the said cytotoxic reagent is of high potency.

We have now devised improved systems at least some of which exhibithigher avidities to the selected target cells, and make use of novel,potent directly or indirectly cytotoxic agents.

SUMMARY OF INVENTION

A first aspect of the invention provides a compound comprising a targetcell-specific portion and a cytotoxic portion characterised in that thecytotoxic portion has nucleolytic activity.

Suitably, as disclosed below, the cytotoxic portion may haveribonucleolytic activity or it may have DNA endonucleolytic activity.

One aspect of the present invention provides a compound comprising atarget cell-specific portion and a directly or indirectly cytotoxicportion, characterised in that the target cell-specific portionrecognizes the target cell with high avidity.

A further aspect of the present invention provides a compound comprisinga target cell-specific portion and a directly or indirectly cytotoxicportion characterised in that the cytotoxic portion is a sub-unit of anoligomer provided that, if the sub-unit is complexed with anothersub-unit of the said oligomer then the said other sub-unit is thecytotoxic portion of a second compound of the invention.

A further aspect of the present invention provides a compound of atleast two molecules each comprising a target cell-specific portion and afurther portion wherein the molecules are complexed to one another viatheir further portions.

A further aspect of the present invention provides a compound comprisingan oligomeric complex of at least two molecules each comprising a targetcell-specific portion wherein the molecules are complexed to one anothervia their cytotoxic portions.

A further aspect of the present invention provides a compound comprisinga target cell-specific portion and a directly or indirectly cytotoxicportion characterised in that the cytotoxic portion contains a bindingsite for a small-molecule wherein the said small-molecule binding sitebinds but does not modify catalytically the said small molecule.

A further aspect of the present invention provides a compound comprisinga target cell-specific portion and a directly or indirectly cytotoxicportion characterised in that the target cell-specific portion comprisestwo or more binding sites for the target cell, wherein the targetcell-specific portion is not an antibody, or bivalent fragment thereof,having respective arms which recognize the same entity as one another.

A further aspect of the present invention provides a compound comprisinga target cell-specific portion and a cytotoxic portion characterised inthat the cytotoxic portion has DNA endonucleolytic activity.

A further aspect of the invention provides a compound comprising amediator portion and a directly or indirectly cytotoxic portion.

By "mediator portion" we mean the portion of the compound thatrecognizes a target cell-specific molecule. The target cell-specificmolecule may be a further compound of any of the appropriate precedingaspects of the present invention or it may be a target cell-specificmolecule known in the art or it may be a derivative thereof capable ofrecognition by the mediator portion.

By "high avidity" we mean that the target cell-specific portionrecognizes the target cell with a binding constant of at least K_(d)=10⁻⁹ M, suitably K_(d) =10⁻¹⁰ M, more suitably K_(d) =10⁻¹¹ M, moresuitably still K_(d) =10⁻¹² M, preferably K_(d) =10⁻¹⁵ M, morepreferably K_(d) =10⁻¹⁸ M, more preferably still K_(d) =10⁻²¹ M, yeteven more preferably K_(d) =10⁻²⁴ M, and in further preference K_(d)=10⁻²⁷ M or even K_(d) =10⁻³⁰ M.

By "target cell specific" portion we mean the portion of the compoundwhich comprises one or more binding sites which recognize and bind toentities on the target cell. The said entities are expressedpredominantly, and preferably exclusively, on the said target cell. Thetarget cell specific portion may contain one or more binding sites fordifferent entities expressed on the same target cell type, or one ormore binding sites for different entities expressed on two or moredifferent target cell types.

By a "directly cytotoxic agent" we mean an agent which in itself istoxic to the cell if it is to reach, and preferably enter the said cell.

By an "indirectly cytotoxic agent" we mean an agent which in itself mayor may not be non-toxic, but which can bind specifically to a cytotoxiccompound, or can bind specifically to a compound which can be convertedinto a cytotoxic compound by the action of a further reagent.

The entity which is recognised may be any suitable entity which isexpressed by tumour cells, virally-infected cells, pathogenicmicroorganisms, cells introduced as part of gene therapy or even normalcells of the body which, for whatever reason, one wishes to target, butwhich is not expressed, or at least not with such frequency, in cellswhich one does not wish to target. The entity which is recognised willoften be an antigen. Examples of antigens include those listed in Table1 below. Monoclonal antibodies which will bind to many of these antigensare already known (for example those given in the Table) but in anycase, with today's techniques in relation to monoclonal antibodytechnology, antibodies can be prepared to most antigens. Theantigen-specific portion may be a part of an antibody (for example a Fabfragment) or a synthetic antibody fragment (for example a single chainFv fragment [ScFv]). Suitable monoclonal antibodies to selected antigensmay be prepared by known techniques, for example those disclosed in"Monoclonal Antibodies: A manual of techniques", H Zola (CRC Press,1988) and in "Monoclonal Hybridoma Antibodies: Techniques andApplications", J G R Hurrell (CRC Press, 1982).

The variable heavy (V_(H)) and variable light (V_(L)) domains of theantibody are involved in antigen recognition, a fact first recognised byearly protease digestion experiments. Further confirmation was found by"humanisation" of rodent antibodies. Variable domains of rodent originmay be fused to constant domains of human origin such that the resultantantibody retains the antigenic specificity of the rodent parentedantibody (Morrison et al (1984) Proc. Natl. Acad. Sci. USA 81,6851-6855).

That antigenic specificity is conferred by variable domains and isindependent of the constant domains is known from experiments involvingthe bacterial expression of antibody fragments, all containing one ormore variable domains. These molecules include Fab-like molecules(Better et al (1988) Science 240, 1041); Fv molecules (Skerra et al(1988) Science 240, 1038); single-chain Fv (ScFv) molecules where theV_(H) and V_(L) partner domains are linked via a flexible oligopeptide(Bird et al (1988) Science 242, 423; Huston et al (1988) Proc. Natl.Acad. Sci. USA 85, 5879) and single domain antibodies (dAbs) comprisingisolated V domains (Ward et al (1989) Nature 341, 544). A general reviewof the techniques involved in the synthesis of antibody fragments whichretain their specific binding sites is to be found in Winter & Milstein(1991) Nature 349, 293-299.

By "ScFv molecules" we mean molecules wherein the V_(H) and V_(L)partner domains are linked via a flexible oligopeptide.

Chimaeric antibodies are discussed by Neuberger et al (1988, 8thInternational Biotechnology Symposium Part 2, 792-799).

Suitably prepared non-human antibodies can be "humanized" in known ways,for example by inserting the CDR regions of mouse antibodies into theframework of human antibodies.

The advantages of using antibody fragments, rather than wholeantibodies, are several-fold. The smaller size of the fragments allowsfor rapid clearance, and may lead to improved tumour to non-tumourratios. Fab, Fv, ScFv and dAb antibody fragments can all be expressed inand secreted from E. coli, thus allowing the facile production of largeamounts of the said fragments.

Whole antibodies, and F(ab')₂ fragments are "bivalent". By "bivalent" wemean that the said antibodies and F(ab')₂ fragments have two antigencombining sites. In contrast, Fab, Fv, ScFv and dAb fragments aremonovalent, having only one antigen combining site.

Alternatively, the entity which is recognised may or may not beantigenic but can be recognised and selectively bound to in some otherway. For example, it may be a characteristic cell surface receptor suchas the receptor for melanocyte-stimulating hormone (MSH) which isexpressed in high number in melanoma cells. The cell-specific portionmay then be a compound or part thereof which specifically binds to theentity in a non-immune sense, for example as a substrate or analogthereof for a cell-surface enzyme or as a messenger.

Preferably, the high avidity target cell specific portion comprises twoor more different binding sites for the target cell.

The different binding sites for the target cell may or may not be two ormore different antibodies, or fragments thereof, which are directed todifferent entities expressed on the target cell. Alternatively, thedifferent binding sites for the target cell may recognize andselectively bind the cell in some other, non-immune sense.

A further alternative is that one or more of the binding sites is anantibody, or part thereof, and that one or more of the binding sites forthe target cell recognize and selectively bind the cell in some other,non-immune sense.

A compound which has binding sites for two or more target cell-specificentities may be more specific for binding to the said target cell, and acompound which has more than one of each of the different binding sitesmay bind to the said target cell with greater avidity. In combining twoor more binding sites, which in themselves may be of high specificitybut low affinity, it will be possible to generate in the compound of theinvention a higher affinity for the target cell whilst retaining thespecificity of the binding sites.

                  TABLE 1                                                         ______________________________________                                        Antigen      Antibody      Existing Uses                                      ______________________________________                                        1. Tumour Associated Antigens                                                 Carcino-embryonic                                                                          {C46 (Amersham)                                                                             Imaging & Therapy                                  Antigen      {85A12 (Unipath)                                                                            of colon/rectum                                                               tumours.                                           Placental Alkaline                                                                         H17E2 (ICRF,  Imaging & Therapy                                  Phosphatase  Travers & Bodmer)                                                                           of testicular and                                                             ovarian cancers.                                   Pan Carcinoma                                                                              NR-LU-10 (NeoRx                                                                             Imaging & Therapy                                               Corporation)  of various                                                                    carcinomas incl.                                                              small cell lung                                                               cancer.                                            Polymorphic  HMFG1 (Taylor-                                                                              Imaging & Therapy                                  Epithelial Mucin                                                                           Papadimitriou,                                                                              of ovarian cancer,                                 (Human milk fat                                                                            ICRF)         pleural effusions.                                 globule                                                                       Human milk mucin                                                                           SM-3(IgG1).sup.1                                                                            Diagnosis, Imaging                                 core protein               & Therapy of breast                                                           cancer                                             β-human Chorionic                                                                     W14           Targeting of enzyme                                Gonadotropin               (CPG2) to human                                                               xenograft                                                                     choriocarcinoma in                                                            nude mice. (Searle                                                            et al (1981) Br. J                                                            Cancer 44, 137-144)                                A Carbohydrate on                                                                          L6 (IgG2a).sup.2                                                                            Targeting of alkaline                              Human Carcinomas           phosphatase. (Senter                                                          et al (1988) Proc.                                                            Natl. Acad. Sci. USA                                                          85, 4842-4846                                      CD20 Antigen on B                                                                          1F5 (IgG2a).sup.3                                                                           Targeting of alkaline                              Lymphoma (normal           phosphatase. (Senter                               and neoplastic)            et al (1988) Proc.                                                            Natl. Acad. Sci. USA                                                          85, 4842-4846                                      2. Immune Cell Antigens                                                       Pan T Lymphocyte                                                                           OKT-3 (Ortho) As anti-rejection                                  Surface Antigen            therapy for kidney                                 (CD3)                      transplants.                                       B-lymphocyte RFB4 (Janossy,                                                                              Immunotoxin therapy                                Surface Antigen                                                                            Royal Free Hospital)                                                                        of B cell lymphoma.                                (CD22)                                                                        Pan T lymphocyte                                                                           H65 (Bodmer,  Immunotoxin                                        Surface Antigen                                                                            Knowles ICRF, treatment of Acute                                 (CDS)        Licensed to Xoma                                                                            Graft versus Host                                               Corp., USA)   disease, Rheumatoid                                                           Arthritis.                                         3. Infectious Agent-Related Antigens                                          Mumps virus-related                                                                        Anti-mumps    Antibody conjugated                                             polyclonal antibody                                                                         to Diphtheria toxin                                                           for treatment of                                                              mumps.                                             Hepatitis B Surface                                                                        Anti HBs Ag   Immunotoxin against                                Antigen                    Hepatoma.                                          ______________________________________                                         .sup.1 Burchell et al (1987) Cancer Res. 47, 5476-5482                        .sup.2 Hellstrom et al (1986) Cancer Res. 46, 3917-3923                       .sup.3 Clarke et al (1985) Proc. Natl. Acad. Sci. USA 82, 1766-1770           Other antigens include alphafoetoprotein, Ca125 and prostate specific         antigen.                                                                 

It is preferable that the two portions of the compound of the inventionare produced as a fusion compound by recombinant DNA techniques wherebya length of DNA comprises respective regions encoding the two portionsof the compound of the invention either adjacent one another orseparated by a region encoding a linker peptide which does not destroythe desired properties of the compound. The benefits in making thecompound of the invention using recombinant DNA techniques are severalfold. Firstly, it enables a high degree of precision with which the twoportions of the compound can be joined together. Secondly, theconstruction of compounds which are "hetero-oligomeric" can becontrolled by the expression of the different recombinant DNA moleculesencoding each of the different type of subunit of the "hetero-oligomer"in the same host cell.

By "hetero-oligomer" we mean those compounds in which two or moredifferent cell-specific portions are joined to either the same or todifferent subunits which are capable of oligomerisation. The expression,in the same host cell of two compounds, A and B, each with differenttarget cell specific portions but with a common second portion capableof oligomerisation will result in a mixed population of compounds. Forexample, if the common second portion is capable of dimerisation, threepotential compounds will be produced: A₂, AB and B₂, in a ratio of1:2:1, respectively.

The separation of the desired compound with each of the different cellspecific portions, that is AB, can be achieved by two step affinitychromatography.

Application of the mixture of compounds to an affinity column specificfor A will result in the binding of A₂ and AB. These compounds areeluted from this first column, and then applied to an affinity columnspecific for B. This will result in AB, but not A₂, being bound to thecolumn. Finally, the desired product AB, can be eluted.

Of course, the order in which the affinity columns are used is notimportant.

The same principle of separating those compounds with two or moredifferent binding sites can be applied to the purification of thedesired compounds from mixtures of other hetero-oligomers.

Conceivably, the two portions of the compound may overlap wholly orpartly.

The DNA is then expressed in a suitable host to produce a polypeptidecomprising the compound of the invention. Thus, the DNA encoding thepolypeptide constituting the compound of the invention may be used inaccordance with known techniques, appropriately modified in view of theteachings contained herein, to construct an expression vector, which isthen used to transform an appropriate host cell for the expression andproduction of the polypeptide of the invention. Such techniques includethose disclosed in U.S. Pat. Nos. 4,440,859 issued Apr. 3, 1984 toRutter et al, 4,530,901 issued Jul. 23, 1985 to Weissman, 4,582,800issued Apr. 15, 1986 to Crowl, 4,677,063 issued Jun. 30, 1987 to Mark etal, 4,678,751 issued Jul. 7, 1987 to Goeddel, 4,704,362 issued Nov. 3,1987 to Itakura et al, 4,710,463 issued Dec. 1, 1987 to Murray,4,757,006 issued Jul. 12, 1988 to Toole, Jr. et al, 4,766,075 issuedAug. 23, 1988 to Goeddel et al and 4,810,648 issued Mar. 7, 1989 toStalker, all of which are incorporated herein by reference.

The DNA encoding the polypeptide constituting the compound of theinvention may be joined to a wide variety of other DNA sequences forintroduction into an appropriate host. The companion DNA will dependupon the nature of the host, the manner of the introduction of the DNAinto the host, and whether episomal maintenance or integration isdesired.

Generally, the DNA is inserted into an expression vector, such as aplasmid, in proper orientation and correct reading frame for expression.If necessary, the DNA may be linked to the appropriate transcriptionaland translational regulatory control nucleotide sequences recognised bythe desired host, although such controls are generally available in theexpression vector. The vector is then introduced into the host throughstandard techniques. Generally, not all of the hosts will be transformedby the vector. Therefore, it will be necessary to select for transformedhost cells. One selection technique involves incorporating into theexpression vector a DNA sequence, with any necessary control elements,that codes for a selectable trait in the transformed cell, such asantibiotic resistance. Alternatively, the gene for such selectable traitcan be on another vector, which is used to co-transform the desired hostcell.

Host cells that have been transformed by the recombinant DNA of theinvention are then cultured for a sufficient time and under appropriateconditions known to those skilled in the art in view of the teachingsdisclosed herein to permit the expression of the polypeptide, which canthen be recovered.

Many expression systems are known, including bacteria (for example E.coli and Bacillus subtilis), yeasts (for example Saccharomycescerevisiae), filamentous fungi (for example Aspergillus), plant cells,animal cells and insect cells.

Those vectors that include a replicon such as a procaryotic replicon canalso include an appropriate promoter such as a procaryotic promotercapable of directing the expression (transcription and translation) ofthe genes in a bacterial host cell, such as E. coli, transformedtherewith.

A promoter is an expression control element formed by a DNA sequencethat permits binding of RNA polymerase and transcription to occur.Promoter sequences compatible with exemplary bacterial hosts aretypically provided in plasmid vectors containing convenient restrictionsites for insertion of a DNA segment of the present invention.

Typical procaryotic vector plasmids are pUC18, pUC19, pBR322 and pBR329available from Biorad Laboratories, (Richmond, Calif., USA) and pTrc99Aand pKK223-3 available from Pharmacia, Piscataway, N.J., USA.

A typical mammalian cell vector plasmid is pSVL available fromPharmacia, Piscataway, N.J., USA. This vector uses the SV40 latepromoter to drive expression of cloned genes, the highest level ofexpression being found in T antigen-producing cells, such as COS-1cells.

An example of an inducible mammalian expression vector is pMSG, alsoavailable from Pharmacia. This vector uses the glucocorticoid-induciblepromoter of the mouse mammary tumour virus long terminal repeat to driveexpression of the cloned gene.

Useful yeast plasmid vectors are pRS403-406 and pRS413-416 and aregenerally available from Stratagene Cloning Systems, La Jolla, Calif.92037, USA. Plasmids pRS403, pRS404, pRS405 and pRS406 are YeastIntegrating plasmids (YIps) and incorporate the yeast selectable markershis3, trp1, leu2 and ura3. Plasmids pRS413-416 are Yeast Centromereplasmids (YCps).

A variety of methods have been developed to operatively link DNA tovectors via complementary cohesive termini. For instance, complementaryhomopolymer tracts can be added to the DNA segment to be inserted to thevector DNA. The vector and DNA segment are then joined by hydrogenbonding between the complementary homopolymeric tails to formrecombinant DNA molecules.

Synthetic linkers containing one or more restriction sites provide analternative method of joining the DNA segment to vectors. The DNAsegment, generated by endonuclease restriction digestion as describedearlier, is treated with bacteriophage T4 DNA polymerase or E. coli DNApolymerase I, enzymes that remove protruding, 3'-single-stranded terminiwith their 3'-5'-exonucleolytic activities, and fill in recessed 3'-endswith their polymerizing activities.

The combination of these activities therefore generates blunt-ended DNAsegments. The blunt-ended segments are then incubated with a large molarexcess of linker molecules in the presence of an enzyme that is able tocatalyze the ligation of blunt-ended DNA molecules, such asbacteriophage T4 DNA ligase. Thus, the products of the reaction are DNAsegments carrying polymeric linker sequences at their ends. These DNAsegments are then cleaved with the appropriate restriction enzyme andligated to an expression vector that has been cleaved with an enzymethat produces termini compatible with those of the DNA segment.

Synthetic linkers containing a variety of restriction endonuclease sitesare commercially available from a number of sources includingInternational Biotechnologies Inc, New Haven, Conn., USA.

A desirable way to modify the DNA encoding the polypeptide of theinvention is to use the polymerase chain reaction as disclosed by Saikiet al (1988) Science 239, 487-491.

In this method the DNA to be enzymatically amplified is flanked by twospecific oligonucleotide primers which themselves become incorporatedinto the amplified DNA. The said specific primers may containrestriction endonuclease recognition sites which can be used for cloninginto expression vectors using methods known in the art.

Exemplary genera of yeast contemplated to be useful in the practice ofthe present invention are Pichia, Saccharomyces, Kluyveromyces, Candida,Torulopsis, Hansenula, Schizosaccharomyces, Citeromyces, Pachysolen,Debaromyces, Metschunikowia, Rhodosporidium, Leucosporidium,Botrvoascus, Sporidiobolus, Endomycopsis, and the like. Preferred generaare those selected from the group consisting of Pichia, Saccharomyces,Kluyveromyces, Yarrowia and Hansenula. Examples of Saccharomyces areSaccharomyces cerevisiae, Saccharomyces italicus and Saccharomycesrouxii. Examples of Kluyveromyces are Kluyveromyces fragilis andKluyveromyces lactis. Examples of Hansenula are Hansenula polymorpha,Hansenula anomala and Hansenula capsulata. Yarrowia lipolytica is anexample of a suitable Yarrowia species.

Methods for the transformation of S. cerevisiae are taught generally inEP 251 744, EP 258 067 and WO 90/01063, all of which are incorporatedherein by reference.

Suitable promoters for S. cerevisiae include those associated with thePGK1 gene, GAL1 or GAL10 genes, CYC1, PHO5, TRP1, ADH1, ADH2, the genesfor glyceraldehyde-3-phosphate dehydrogenase, hexokinase, pyruvatedecarboxylase, phosphofructokinase, triose phosphate isomerase,phosphoglucose isomerase, glucokinase, α-mating factor pheromone,a-mating factor pheromone, the PRB1 promoter, the GUT2 promoter, andhybrid promoters involving hybrids of parts of 5' regulatory regionswith parts of 5' regulatory regions of other promoters or with upstreamactivation sites (eg the promoter of EP-A-258 067).

The transcription termination signal is preferably the 3' flankingsequence of a eukaryotic gene which contains proper signals fortranscription termination and polyadenylation. Suitable 3' flankingsequences may, for example, be those of the gene naturally linked to theexpression control sequence used, i.e. may correspond to the promoter.Alternatively, they may be different in which case the terminationsignal of the S. cerevisiae AHD1 gene is preferred.

The present invention also relates to a host cell transformed with apolynucleotide vector construct of the present invention. The host cellcan be either procaryotic or eucaryotic. Bacterial cells are preferredprocaryotic host cells and typically are a strain of E. coli such as,for example, the E. coli strains DH5 available from Bethesda ResearchLaboratories Inc., Bethesda, Md., USA, and RR1 available from theAmerican Type Culture Collection (ATCC) of Rockville, Md., USA (No ATCC31343). Preferred eucaryotic host cells include yeast and mammaliancells, preferably vertebrate cells such as those from a mouse, rat,monkey or human fibroblastic cell line. Preferred eucaryotic host cellsinclude Chinese hamster ovary (CHO) cells available from the ATCC asCCL61, NIH Swiss mouse embryo cells NIH/3T3 available from the ATCC asCRL 1658 and monkey kidney-derived COS-1 cells available from the ATCCas CRL 1650.

Transformation of appropriate cell hosts with a DNA construct of thepresent invention is accomplished by well known methods that typicallydepend on the type of vector used. With regard to transformation ofprocaryotic host cells, see, for example, Cohen et al, Proc. Natl. Acad.Sci. USA, 69: 2110 (1972); and Sambrook et al, Molecular Cloning, ALaboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor,N.Y. (1989). Transformation of yeast cells is described in Sherman etal, Methods In Yeast Genetics, A Laboratory Manual, Cold Spring Harbor,N.Y. (1986). The method of Beggs, Nature, 275: 104-109 (1978) is alsouseful. With regard to vertebrate cells, reagents useful in transfectingsuch cells, for example calcium phosphate and DEAE-dextran or liposomeformulations, are available from Stratagene Cloning Systems, or LifeTechnologies Inc, Gaithersburg, Md. 20877, USA.

Successfully transformed cells, ie cells that contain a DNA construct ofthe present invention, can be identified by well known techniques. Forexample, cells resulting from the introduction of an expressionconstruct of the present invention can be grown to produce thepolypeptide of the invention. Cells can be harvested and lysed and theirDNA content examined for the presence of the DNA using a method such asthat described by Southern, J. Mol. Biol., 98: 503 (1975) or Berent etal, Biotech., 3: 208 (1985). Alternatively, the presence of the proteinin the supernatant can be detected using antibodies as described below.

In addition to directly assaying for the presence of recombinant DNA,successful transformation can be confirmed by well known immunologicalmethods when the recombinant DNA is capable of directing the expressionof the protein. For example, cells successfully transformed with anexpression vector produce proteins displaying appropriate antigenicity.Samples of cells suspected of being transformed are harvested andassayed for the protein using suitable antibodies.

Thus, in addition to the transformed host cells themselves, the presentinvention also contemplates a culture of those cells, preferably amonoclonal (clonally homogeneous) culture, or a culture derived from amonoclonal culture, in a nutrient medium. Preferably, the culture alsocontains the protein.

Nutrient media useful for culturing transformed host cells are wellknown in the art and can be obtained from several commercial sources.

Alternatively, the target-cell specific and second portions of thecompound of the invention are linked together by any of the conventionalways of cross-linking polypeptides, such as those generally described inO'Sullivan et al Anal. Biochem. (1979) 100, 100-108. For example, theantibody portion may be enriched with thiol groups and the enzymeportion reacted with a bifunctional agent capable of reacting with thosethiol groups, for example the N-hydroxysuccinimide ester of iodoaceticacid (NHIA) or N-succinimidyl-3-(2-pyridyldithio)propionate (SPDP).Amide and thioether bonds, for example achieved withm-maleimidobenzoyl-N-hydroxysuccinimide ester, are generally more stablein vivo than disulphide bonds.

Some of the various compounds of the invention are illustrateddiagrammatically in FIG. 1. C and D are the target cell-specificportions, and X is the cytotoxic portion. Of course, X may form higherorder oligomers than those illustrated for example trimers, tetramers,pentamers, hexamers.

In FIG. 1(a) to 1(d) C and D are shown binding to entities on either thesame, or different cells.

In one embodiment of the invention, C and D recognize differentmolecules on the same target cell wherein the molecules on the sametarget cell are not confined to that cell type but may occur on a fewother cell types. In particular, C may recognize molecules on cell typesI, II and III, whereas D may recognize molecules on cell types I, IV andV. Thus a compound of the invention comprising C and D as the targetcell-specific portion will have greater specificity for cell type Icompared with cell types II, III and IV. This aspect of the invention isparticularly helpful, as there have been very few completely targetcell-specific molecules discovered, whereas molecules which occur on afew cell types, and which are useful in this aspect of the invention,are well known. Such molecules are usually cell-surface antigens forwhich cross-reactive antibodies are known. Examples of such moleculesare given in Table 2.

                  TABLE 2                                                         ______________________________________                                        Antigen   Cell-type      Antibody                                             ______________________________________                                        CD9       Pre-B cells,   MM2/57 (IgG2b,                                                 monocytes, platelets                                                                         mouse)                                               CALLA     Lymphoid progenitor                                                                          B-E3 (IgG2a, mouse)                                            cells, granulocytes                                                 CD13      Myeloid monocytes,                                                                           B-F10 (IgGl, mouse)                                            granulocytes                                                        CD24      B-cells, granulocytes                                                                        ALB-9 (IgGl,                                                                  mouse)                                               CD61      Platelets,     PM 6/13 (IgGl,                                                 megakaryocytes mouse)                                               ______________________________________                                    

The antibodies described in Table 2 are generally available fromSerotec, Oxford. OX5 1BR, UK.

Preferably, the cytotoxic portion of the compound of the invention iscapable of oligomerisation. Attachment of the target-cell specificportion to a cytotoxic portion capable of oligomerisation provides amethod for increasing the number of binding sites to the target cell.For example, if the target cell-specific portion is joined to a portioncapable of forming a dimer then the number of target cell-specificbinding sites is two; if the target cell-specific portion is joined to aportion capable of forming a tetramer then the number of targetcell-specific binding sites is four. The number of target cell-specificbinding sites is greater than one and the compounds may therefore have agreater avidity for the target cell than do compounds which only haveone target cell-specific binding site.

It is preferable for the cytotoxic portion of the compound of theinvention capable of oligomerisation to contain no interchain disulphidebonds nor intrachain disulphide bonds; to be well characterised; to benon-toxic; to be stable; to be amenable to preparation in a formsuitable for pre-clinical or clinical use or be in pre-clinical orclinical use; and for the subunit monomers to have a high affinity foreach other, that is they contain one or more subunit binding sites.

Preferably, each subunit of the cytotoxic portion of the compound of theinvention contains a binding site for a small molecule, the smallmolecule being capable of being conjugated to any from the followingcompounds: radioactive compound; spin-labelled compound; drug; pro-drug;radionuclide; protein including enzyme; antibody; or toxin.

In a preferred embodiment of the invention, the cytotoxic portion isstreptavidin. Streptavidin is a homotetrameric molecule of M_(r) =60000(subunit M_(r) =15000) and is produced by Streptomvces. Streptavidinbinds four molecules of the water-soluble vitamin biotin with highspecificity and affinity (K_(d) =10⁻¹⁵ M) although isolated subunitspossess a very much lower affinity for biotin (K_(d) =10⁻⁸ M). Eachsubunit of streptavidin has a tightly-packed "core", with relativelyunstructured amino- and carboxyl-terminal extensions. These extensionsare believed to contribute to the formation of higher order aggregatesof streptavidin. Many commercial forms of streptavidin are extensivelyproteolysed, have lost their unstructured extensions, and form stabletetramers (Bayer et al (1989) Biochem J. 259, 369-376; Bayer et al(1990) Methods Enzymol. 184, 51-67). The mature form of the protein hasbeen the subject of recent research and is becoming increasingly wellcharacterised (Gitlin et al (1988) Biochem J. 256, 279-282; Gitlin et al(1990) Biochem J. 269, 527-530; Sano & Cantor (1990) J. Biol. Chem. 265,3369-3373) and the gene has been cloned and sequenced (Argarana et al(1986) Nucl. Acids Res. 14, 1871-1872) and expressed in E. coli (Sano &Cantor (1990) Proc. Natl. Acad. Sci. USA 87, 142-146). A modified formof the gene is available commercially from British Bio-technology Ltd,Oxford, UK.

Of course, for the invention to work the cytotoxic portion may compriseintact streptavidin, or it may comprise a fragment or fragments ofstreptavidin retaining at least the biotin- and subunit-binding sites.

Of course, the cytotoxic portion may comprise other molecules which bindbiotin with high affinity, such as intact avidin, or it may comprise afragment or fragments of avidin retaining at least the biotin- andsubunit-binding sites. A comparison of avidin and streptavidin is madein Table 3. As avidin is naturally glycosylated, then it may bedesirable to express the DNA encoding the compound of the invention in aeukaryotic cell such as yeast, mammalian or insect cell.

    ______________________________________                                                  Avidin            Streptavidin                                      ______________________________________                                         Source     Tissues and egg-whites of                                                                         Streptomyces                                              birds, reptiles and amphibia                                                                      avidinii                                      Glycoprotein                                                                              yes                 no                                            pI          10                  5                                             M.sub.r (subunit)                                                                         67,000              60,000                                        Oligomeric state                                                                          Tetramer            Tetramer                                      ______________________________________                                    

By "subunit-binding sites" we mean those parts of the monomers that arenecessary for the monomers to combine with one or more other monomers toproduce an oligomer.

Biotin has an extremely high affinity for streptavidin (K_(d) =10⁻¹⁵ M)and at the same time is small enough to diffuse rapidly through mosttissues in the body. Some of the biotin conjugates useful in theinvention are known in the art, and it is preferred that the biotin isconjugated via a flexible linker arm to reduce any steric hindrance tothe binding of the biotin portion of the conjugate to streptavidin oravidin.

Examples of biotin conjugates useful in the invention are biotinylatedgrowth factors and cytokines such as TNFα-biotin and EGF-biotin whichare generally available from Boehringer Mannheim, Mannheim, Germany, andbiotin-alkaline phosphatase, biotin-fluorescein, biotin-peroxidase andother conjugates generally available from Calbiochem-Novabiochem,Nottingham, UK. Activated biotin reagents, suitable for conjugating toother molecules, are generally available from Fluka, Buchs, Switzerland.

In a second preferred embodiment of the invention, the cytotoxic portionis a dimeric compound with ribonucleolytic activity, such as a ribozyme,but preferably ribonuclease (RNase). The enzymes of the RNase family areable to degrade single-stranded RNA molecules to smaller polynucleotidesand are directly cytotoxic when intracellular. Bovine seminal RNase(BSRNase) has activities in addition to its RNA-degrading activity,namely anti-tumour (Vescia et al (1980) Cancer Res. 40, 3740-3744;Vescia & Tramontano (1981) Mol. Cell. Biochem. 36, 125-128);immunosuppressive (Tamburrini et al (1990) Eur. J. Biochem. 190,145-148; activation by interferon-γ (Schein et al (1990) Nucl. AcidsRes. 18, 1057) and anti-spermatogenic (Doital & Matonsek (1973) J.Reprod. Fertil. 33, 263-274). BSRNase is a dimer and forms two uniquedisulphide bridges across the subunit interface (Piccoli et al (1988)Biochem J. 253, 329-336). The cDNA encoding the precursor to BSRNase canbe prepared using the methods disclosed by Preub et al (1990) FEBS Lett.270, 229-232.

Of course, for the invention to work the cytotoxic portion may compriseintact BSRNase, or it may comprise a fragment or fragments of BSRNaseretaining at least the active site and subunit-binding sites.

It is further preferred if the fusion with the RNase comprises thesequence KDEL (SEQ ID No 29) at, or near to, the C-terminus of theprotein.

It is still further preferred if a linker sequence is present at theN-terminus of the RNase to allow the N-terminus to be more flexible andincrease the likelihood of dimer formation.

Preferably, a disulphide-loop-containing sequence which allows an RNaseto be linked to a ScFv via a disulphide bond is present in a fusionprotein.

In one embodiment the invention, the cytotoxic portion is a compoundwith DNA endonucleolytic activity such as copper-phenanthroline adductsbut preferably is a DNA endonuclease, for example deoxyribonuclease-I(DNase-I), which is an endonuclease which cleaves double-stranded DNA toyield 5' phosphorylated polynucleotides. It does not cut all DNA siteswith the same frequency as it is affected by the local structure of theDNA (specifically, the size of the minor groove).

Alternatively, the DNA endonuclease could be a type II restrictionendonuclease. Type II restriction endonucleases are enzymes isolatedfrom microorganisms, usually bacteria, which cleave double-stranded DNAat specific sequences. Typically, the type II restriction endonucleasesrecognize palindromic sequences in DNA and cleave both strands of theDNA within or adjacent the recognition site. Type II restriction enzymesare dimers of identical subunits, and, for example, EcoRI is a homodimerof 31 kDa subunits which recognizes the sequence 5'-GAATTC-3'.

Other type II restriction enzymes recognize different hexonucleotidesequences, for example BamHI recognizes 5'-GGATCC-3', HindIII recognizes5'-AAGCTT-3'. In addition, type II restriction enzymes which recognizedifferent numbers of bases are known, for example, MspI recognizes5'-CCGG-3', Sau3AI recognizes 5'-GATC-3', HinfI recognizes 5'-GANTC-3'and NotI recognizes 5'-GCGGCCGC-3'. Of course, the fewer specific basesin the recognition sequence, the more likely that any DNA molecule willbe cleaved by the cognate type II endonuclease.

The gene for the bovine DNase I has been chemically synthesized andexpressed in E. coli (Worrall & Connolly (1990) J. Biol. Chem. 265,21889-21895. The gene for the human enzyme has been cloned, from a humanpancreatic cDNA library constructed in λgt10 and the enzyme has beenexpressed in human cell culture and used in the relief of cysticfibrosis symptoms, by reducing the viscosity of sputum, by degrading theviscous DNA (Shak et al (1990) Proc. Natl. Acad. Sci. 87, 9188-9192;Hubbard et al (1992) N. Engl. J. Med. 326, 812-815). All the enzymes arecompact, monomeric proteins of about 29 kDa (260 amino acids); whenglycosylated the human enzyme is about 35 kDa. It is dependent ondivalent cations for activity (Ca²⁺, Mg²⁺). The human enzyme is about75% identical to the bovine enzyme, at the amino acid sequence level.The synthetic gene encoding the bovine DNase-I can be prepared using themethods disclosed by Worrall & Connolly (1990) loc. cit.

The enzyme from bovine pancreas has been purified and crystallized, anda high resolution structure determined at 2 Å (Suck & Oefner (1986) J.Mol. Biol. 192, 605-632).

One aspect of the invention is the introduction into the targeted cellof the DNAse I enzyme. During stages of mitosis, when the nuclearmembrane is dissolved, the chromosomal DNA of the said targeted cellwill be susceptible to nuclease attack. In this embodiment of theinvention DNAse I will be particularly cytotoxic to rapidly dividingcells, such as tumour cells.

A further aspect of the invention is the incorporation into the compoundof the invention a nuclear localisation sequence from the SV40 large Tantigen (Kalderon et al (1984) Cell 39, 499-509). The said nuclearlocalisation sequence is PKKKRKV (SEQ ID No 1), or analogues thereof,and a DNA fragment encoding the said sequence, or analogues thereof, mayor may not be incorporated into the gene expressing the compound of theinvention containing DNAse I as the second portion.

Inclusion of the said nuclear localisation sequence will allow thecompound of the invention to gain access to the chromosomal DNA duringthe periods of the cell cycle when the nuclear membrane is intact, asthe nuclear pores are permeable to large macromolecules incorporatingthe said nuclear localisation sequence, or analogues thereof.

For the invention to work, of course, the cytotoxic portion may comprisea fragment of RNase or of DNA endonuclease which retain their enzymaticactivity, such as the active site, and in the case of the dimeric RNase,and restriction endonuclease, their subunit binding site.

A further aspect of the invention is that the RNase and the DNase are ofmammalian, preferably human, origin. The use of the said mammalianproteins as the second, functional portion of the compound of theinvention is advantageous as such compounds are less likely to give riseto undesirable immune reactions.

Many target cell-specific molecules are known, such as those disclosedin Table 1, which are not joined to a further directly or indirectlycytotoxic portion, but may nevertheless be useful in directing cytotoxicagents to a target cell.

Thus in a further aspect of the invention a compound comprises amediator portion and a directly or indirectly cytotoxic portion. Themediator may recognize the native target cell-specific molecule, but itis preferable for the mediator to recognize a derivative of the saidmolecule.

In the case of antibodies, the native target cell-specific molecule maybe recognised by the mediator via its Fc portion.

The said derivative may be made by joining a moiety, such as a smallmolecule, for example a hapten, to the said molecule, and may berecognised, if the mediator is, for example, an antibody or fragmentthereof.

The advantage in using this method is that the same moiety may be joinedto all types of target cell-specific molecules, and then only onecompound, comprising a mediator which recognizes the said moiety and adirectly or indirectly cytotoxic portion, may be used to deliver thecytotoxic agent to the target cell.

In one embodiment of the invention the mediator is ScFV_(NP), and themoiety recognised by the said ScFV_(NP) is the hapten4-hydroxy-3-nitrophenylacetic acid (NP) or 4-hydroxy-3-iodophenylaceticacid, and the target cell-specific molecule is an antibody.

Other haptens are suitable as are other molecules, such as peptides,that can be recognised by the mediator. Conveniently the peptide is thecore mucin peptide.

Before such molecules can be regarded as suitable candidates, there is arequirement that cell specificity be demonstrated and a furtherrequirement that this specificity be shown to be conferred only by thecombination of the interaction of the primary targeting antibody withtarget, and the interaction of the second step reagent (in this case theScFv) with the primary antibody. To this end, the primary antibody needsto be recognised specifically by the mediator, and therefore requiresstable modifications that will distinguish it from native antibodies.Multiple derivatisation of the primary antibody with a hapten fulfilsthis demand, and has the further advantage of amplification, providingan array of secondary targets for the mediator.

Of course, other mediators such as Fab, F(ab')₂, dAbs or other antibodyfragments may be used. The mediator may also recognize the moiety in anon-immune sense, such as in biotin-streptavidin recognition. It ispreferred if the moiety recognised is a small molecule, but the moietymay also be a polypeptide, peptide, oligosaccharide or the like.

The murine immune response to the haptens 4-hydroxy-3-nitrophenylaceticacid (NP) and 4-hydroxy-3-iodo-5-nitrophenylacetic acid (NIP) isdominated by well characterised V_(H) domains and a λ₁ light chain(Kabat et al (1987) Sequences of proteins of immunological interest, USDepartment of Health and Human Services, Public Health Services,National Institutes of Health). NP-specific V_(H) domains have been usedin the construction of recombinant antibodies (Neuberger et al (1984)Nature 312, 604-608, Casedei et al (1990) Proc. Natl. Acad. Sci. USA 87,2047-2051). The hapten itself is well studied and of some immunologicalinterest (Brownstone et al (1966) loc. cit.) and is also availablecommercially in a variety of chemical forms. It is relatively simple toconjugate NP or NIP to other proteins including antibodies.

We describe in the Examples the construction and characterisation of aScFv with an affinity in the range of 1-3×10⁸ M⁻¹ at pH 7.4 for NIPconjugated to BSA, sufficiently high that the molecule is suitable as asecond step targeting reagent. Derivatisation with hapten resulted inreduction in immunoreactivity of the primary antibody, but even underthese adverse circumstances the hapten-conjugated antibody was stillcapable of delivering ScFv_(NP) specifically to cells. Since about fortyhapten molecules were conjugated, on average, to each mAb molecule,there is still a potential 40-fold amplification provided. Thespecificity of targeting is governed by the interactions of primaryantibody with target, and the ScFv_(NP) with derivatised primaryantibody, since the ScFv does not bind cells and non-derivatisedantibodies bound at cells cannot capture the ScFv. The ScFv describedhere can therefore be considered as a universal agent for delivery ofdrugs or radionuclides or other cytotoxic agents to any cell type forwhich a previously characterised antibody exists.

In this aspect of the invention, the cytotoxic portion joined to themediator portion may be a drug, pro-drug, radionuclide, proteinincluding an enzyme, antibody or any other therapeutically usefulreagent.

Thus, the drug may be a cytotoxic chemical compound such asmethotrexate, adriamicin, vinca alkaloids (vincristine, vinblastine,etoposide), daunorubicin or other intercalating agents. The enzyme, orenzymatic portion thereof, may be directly cytotoxic, such as DNaseI orRNase, or indirectly cytotoxic such as an enzyme which converts asubstantially non-toxic pro-drug into a toxic form. The protein may bericin. The cytotoxic portion may comprise a highly radioactive atom,such as iodine-131, rhenium-186, rhenium-188 or yttrium-90, which emitsenough energy to destroy neighboring cells.

An indirectly cytotoxic portion may be a small-molecule binding sitewherein the said small-molecule is capable of being conjugated to anyfrom the following cytotoxic compounds: radioactive compound; drug;pro-drug; radionuclide; protein including enzyme; antibody; or toxin.

We hereby disclose the principle that ScFvs are suitable for indirecttargeting. Moderating the degree of derivatisation of the primaryantibody will reduce the loss of immunoreactivity of the primaryantibody whilst still maintaining an array of secondary targets for thehapten-specific ScFv.

In a further embodiment, the cytotoxic portion of the compound comprisesat least the biotin-binding portion of streptavidin as disclosed inExample 4.

The compounds of the invention are administered in any suitable way,usually parenterally, for example intravenously, intraperitoneally or,preferably (for bladder cancer), intravesically (ie into the bladder),in standard sterile, non-pyrogenic formulations of diluents andcarriers, for example isotonic saline (when administered intravenously).

A further aspect of the invention provides a method of delivery of thecompound of the invention which contains a binding site for a smallmolecule, and the administration of the said small molecule conjugatedwith any from the following: drug, pro-drug, radionuclide, enzyme,antibody or any other therapeutically useful reagent, to give the "smallmolecule conjugate".

Once the compound has bound to the target cells and been cleared fromthe bloodstream (if necessary), which typically takes a day or so, thesmall molecule conjugate is administered, usually as a single infuseddose. If needed, because the compound of the invention may beimmunogenic, cyclosporin or some other immunosuppressant can beadministered to provide a longer period for treatment but usually thiswill not be necessary.

The timing between administrations of the compound and the smallmolecule conjugate may be optimised in a non-inventive way since targetcell/normal tissue ratios of conjugate (at least following intravenousdelivery) are highest after about 4-6 days, whereas at this time theabsolute amount of antibody bound to the tumour, in terms of percent ofinjected dose per gram, is lower than at earlier times. Therefore, theoptimum interval between administration of the conjugate and the smallmolecule conjugate will be a compromise between peak targetconcentration of enzyme and the best distribution ratio between targetand normal tissues.

The dosage of the small molecule conjugate will be chosen by thephysician according to the usual criteria. The dosage of the compound ofthe invention will similarly be chosen according to normal criteria,and, in the case of tumour treatment, particularly with reference to thetype, stage and location of the tumour and the weight of the patient.The duration of treatment will depend in part upon the rapidity andextent of any immune reaction to the antibody or cytotoxic component ofthe compound.

A further aspect of the invention provides a method of delivery of thetarget cell-specific molecule and a compound of the invention whichcontains a mediator portion. Once the target cell-specific molecule hasbound to the target cells and been cleared from the bloodstream (ifnecessary), which typically takes a day or so, the compound comprising amediator portion is administered in any suitable way.

If the cytotoxic portion, joined to the mediator portion, contains abinding site for a small molecule, then, once the mediator-containingcompound has bound to the target cell-specific molecule at the site ofthe target cell, and has been cleared from the bloodstream (ifnecessary), the said small molecule conjugate is administered asdescribed supra.

The compounds of the invention either in themselves, or together with atarget cell-specific molecule or additionally together with anappropriate toxic agent, capable of binding to the smallmolecule-binding site of the compound, are in principle suitable for thedestruction of cells in any tumour or other defined class of cellsselectively exhibiting a recognisable (surface) entity. The compoundsare principally intended for human use but could be used for treatingother mammals including dogs, cats, cattle, horses, pigs and sheep.

The small molecule conjugate, when used in combination with a compoundfor diagnosis, usually comprises a radioactive atom for scintigraphicstudies, for example technetium 99m (^(99m) Tc) or iodine-123 (¹²³ I),or a spin label for nuclear magnetic resonance (nmr) imaging (also knownas magnetic resonance imaging, mri), such as iodine-123 again,iodine-131, indium-111, fluorine-19, carbon-13, nitrogen-15, oxygen-17,gadolinium, manganese or iron.

When used in combination with a compound for selective destruction ofthe tumour, the small molecule conjugate may comprise a highlyradioactive atom, such as iodine-131, rhenium-186, rhenium-188 oryttrium-90, which emits enough energy to destroy neighboring cells, or acytotoxic chemical compound such as methotrexate, adriamicin, vincaalkaloids (vincristine, vinblastine, etoposide), daunorubicin and otherintercalating agents or (preferably) an enzyme or enzymatic portionthereof which converts a non-toxic pro-drug into a toxic form. In thelatter case. the compound of the invention is administered and, oncethere is an optimum balance between (i) the tumour to normal cell ratioof compound and (ii) the absolute level of compound associated with thetumour, the pro-drug is administered either systemically (egintravenously) or intravesically, into the bladder. The enzyme/pro-drugsystems of Bagshawe and his co-workers may be used (loc. cit.) or theantibody-alkaline phosphatase conjugates, followed by etopositephosphate (loc. cit.) or, more preferably, the cyanide-liberatingsystems described by Epenetos (loc. cit.).

The compounds of the invention, together with an appropriate smallmolecule conjugated to a readily-detectable reagent such as aradionuclide; fluorescent molecule; or enzyme are in principle suitedfor the recognition of antigens in other situations. These includeimmunoblotting procedures, such as the well-known Western blot (Towbinet al (1979) Proc. Natl. Acad. Sci. USA 76, 4350-4354); assays such asthe enzyme-linked immunosorbent assay (ELISA); and in situ hybridisationexperiments in which the presence of antigens within fixed cells isdetected.

In a further embodiment of the invention, a compound comprising anoligomeric complex of at least two molecules each comprising a targetcell-specific portion and a further portion wherein the molecules arecomplexed to one another via their further portions is useful inagglutinating cells. In a preferred embodiment the target cell-specificportion of the compound of the invention recognizes particular bloodgroup antigens displayed on the surface of the erythrocyte, and becauseof the multivalent binding nature of the compound, the addition of thecompound to blood may lead to haemagglutination. Thus, in thisembodiment the compounds may be specific to particular antigens withinthe ABO, Rhesus, Kell, or any other blood group systems, and thecompound of the invention may find uses in blood group typing or otherareas of tissue typing.

Antibodies, including monoclonal antibodies, are known which react withmost of the aforementioned blood group antigens and it is well withinthe scope of a person skilled in the art to derive, for example, ScFvsfrom such antibodies for use in the invention.

The invention will now be described in detail with reference to thefollowing figures and examples wherein:

FIG. 1(a), 1(b), 1(c), and 1(d) show diagrammatic representations ofcompounds in accordance with the invention.

FIG. 2 shows the construction of plasmids expressing ScFv_(NP).

FIG. 3 shows oligonucleotide primers used in the polymerase chainreaction to amplify various fragments of the ScFv coding region.

FIG. 4A and FIG. 4B show the nucleotide sequence (SEQ ID No 2) (andencoded protein sequence (SEQ ID No 3)) between the HindIII and EcoRIsites of pRAS107 and pRAS111.

FIG. 5 shows the binding of a soluble protein expressed from pRAS111 toNIP₁₅ -BSA. The meaning of the symbols in FIGS. 5, 6 and 12 through 14are depicted on FIG. 6.

FIG. 6 shows that a soluble protein expressed from pRAS111 and whichbinds NIP₁₅ -BSA can be competed by NIP₁₅ -BSA.

FIG. 7 shows the construction of plasmids expressing ScFv-streptavidinfusions in vitro.

FIG. 8 shows the construction of plasmids for the expression ofScFv-streptavidin fusions in E. coli.

FIG. 9A and FIG. 9B show the nucleotide sequence (SEQ ID No 4) (anddeduced amino acid sequence (SEQ ID No 5)) between the HindIII and EcoRIsites of pRAS108 and pRAS112.

FIG. 10A and 10B show the nucleotide sequence (SEQ ID No 6) (and deducedamino acid sequence (SEQ ID No 7)) between the HindIII and EcoRI sitesof pRAS109 and pRAS113.

FIG. 11A and 11B show the nucleotide sequence (SEQ ID No 8) (and deducedamino acid sequence (SEQ ID No 9)) between the HindIII and EcoRI sitesof pRAS110 and pRA114.

FIG. 12 shows the detection of soluble pRAS112-encoded protein (fulllength ScFv_(NP) -streptavidin monomer) in bacterial supernatants.

FIG. 13 shows that pRAS112-encoded protein binds to NIP₁₅ -BSA, but notto lysozyme.

FIG. 14A and FIG. 14B show that concentrated pRAS112-encoded proteinbinds iminobiotin-Sepharose at pH 11 in contrast to parental ScFv_(NP)protein that does not.

FIG. 15 shows a diagrammatic representation of pRAS112-encoded protein.

FIG. 16 shows the construction of plasmids expressing ScFv-BSRNasefusion molecules.

FIG. 17 shows a diagrammatic representation of a ScFv-BSRNaseheterodimer.

FIG. 18 shows the construction of plasmids expressing ScFv-DNAseI fusionmolecules.

FIG. 19 shows the purification of pRAS111 ScFv_(NP) protein.

FIG. 20A, FIG. 20B, FIG. 20C, FIG. 20D, FIG. 20E and FIG. 20F showindirect targeting of pRAS111 ScFv_(NP).

FIG. 21 shows the nucleotide sequence (SEQ ID No 10) of the ScFv-BSRNasefusion (anti-4-OH-nitrophenacetyl antibody) that has been insertedbetween the HindIII and EcoRI sites of plasmid pSP71.

FIG. 22 shows the nucleotide sequence (SEQ ID No 11) of the ScFv-BSRNasefusion (H17-BSRNase; anti-human placental alkaline phosphatase antibody;H17E2) that has been inserted between the HindIII and EcoRI sites ofplasmid pSP71.

FIG. 23 shows the nucleotide sequence (SEQ ID No 12) of the ScFv-BSRNasefusion (anti-lysozyme antibody) that has been inserted between theHindIII and EcoRI sites of a plasmid pUC18.

FIG. 24 shows the nucleotide sequence (SEQ ID No 13) of the ScFv-DNaseIfusion (anti-4-OH nitrophenacetyl antibody) that has been insertedbetween the HindIII and BglI sites of plasmid pSP71.

FIG. 25 shows the nucleotide sequence (SEQ ID No 14) of the ScFv-DNaseIfusion (anti-human placental alkaline phosphatase antibody; H17E2) thathas been inserted between the HindIII and BglI sites of plasmid pSP71.

FIG. 26 shows the nucleotide sequence (SEQ ID No 15) of the ScFv-DNaseIfusion (anti-lysozyme antibody) that has been inserted between theHindIII and BglI sites of plasmid pUC18.

FIG. 27 shows the results of cell-killing experiments using HEp2 cellsand the fusion protein H17-DT-BSR, H17-DT-BSR/KDEL andH17-DT-BSR/KDELINK.

FIG. 28 is a schematic diagram of the H17E2 scFv-seminal RNase fusionproteins. The plasmid which express them are named in parentheses.

FIG. 29 shows the nucleotide sequence (SEQ ID No 24) encoding the H17E2scFv-diptheria toxin disulphide loop-BSRNase (H17-Dip. Tox.-BSRNase).

FIG. 30 shows the nucleotide sequence (SEQ ID No 25) encoding the H17E2scFv-diptheria toxin disulphide loop-BSRNase-KDEL (H17-Dip. Tox.-BSRNaseKDEL).

FIG. 31 shows the nucleotide sequence (SEQ ID No 26) encoding the H17E2ScFv diptheria toxin disulphide loop-Linker-BSRNase-KDEL (H17-Dip.Tox.-link-BSRNase KDEL).

FIG. 32 shows the nucleotide sequence (SEQ ID No 27) encoding the H17E2ScFv-Linker-BSRNase-KDEL (H17-LBSRNase-KDEL).

FIG. 33 shows the nucleotide sequence (SEQ ID No 28) encoding the H17E2ScFv-BSRNase KDEL).

FIG. 34 shows the elution of pRAS111 and pRAS112 proteins fromNP-sepharose with 50 mM glycine HCl, pH 2.2.

EXAMPLE 1 Construction of a Single-chain Fv (ScFv) Reactive Against theHapten NP (4-OH Nitrophenacetyl)

Plasmid Constructions

Plasmids are shown in FIG. 2. Filled circles represent promoters:P_(lac), lac promoter of pUC plasmids; P_(SP6), SP6 promoter; P_(T7), T7promoter. Open boxes represent fused gene portions: pelB, the signalsequence derived from the pectate lyase B gene of Erwinia caratovora;(G₄ S)₃, flexible oligopeptide linker comprising three tandem repeats ofN-GlyGlyGlyGlySer-C (SEQ ID No 16); myc, a small immunogenic tag derivedfrom c-myc. Restriction enzyme sites: B, BamHI; Bs, BstEII; b, BglII; C,ClaI; E, EcoRI; H, HindIII; K, KpnI; P, PstI; Sp, SphI; Ss, SstI; X,XhoI.

Plasmid pSWsFvD1.3myc (McCafferty et al (1990) Nature 348, 552-554)encodes a single-chain Fv reactive against hen egg lysozyme, and whichcomprises VH_(D1).3 and Vκ_(D1).3 domains linked by a flexibleoligopeptide, (G₄ S)₃, under the transcriptional control of the lacpromoter of E. coli. The region encoding Vκ_(D1).3 was replaced by oneencoding Vλ in the following manner. The segment encoding VH_(D1).3 (G₄S)₃ was subjected to polymerase chain reaction (PCR) mediatedamplification using oligonucleotide primers VHBACK2 (SEQ ID No 17) andBAMLINKERFOR (SEQ ID No 18) (FIG. 3). Primer BAMLINKERFOR directs theincorporation of a BamHI site that also encodes the two carboxy-terminalamino acids of the flexible oligopeptide linking the two V domains.

A Vλ gene segment was amplified from chromosomal DNA of plasmacytomaJ558L using primer pair BAMVλBACK (SEQ ID No 19) and ECOVλFOR (SEQ ID No20). The former directs the incorporation of a BamHI site at the 5' endof the gene; the latter two stop codons and XhoI and EcoRI sites at the3' end of the gene.

The two amplified products were used to replace the PstI-EcoRI fragmentof plasmid pRAS103 to generate plasmid pRAS106 which encodes a ScFvprotein comprising VH_(D1).3 (G₄ S)₃ Vλ_(J558L) under thetranscriptional control of the SP6 promoter.

The PstI-BstEII fragment of pRAS106 was replaced with a PstI-BstEIIfragment encoding VH_(NP) amplified from plasmid pRAS49 (Spooner andLord (1991) loc. cit.) using primers VHBACK3 (SEQ ID No 21) and VH1FOR-2(SEQ ID No 22) to generate plasmid pRAS107. This bears a VH_(NP) (G₄ S)₃Vλ_(J558L) ScFv under the transcriptional control of the SP6 promoter,and is intended purely for expression in in vitro systems.

Plasmid pRAS111 bears the ScFv of pRAS107, but under T7 promotercontrol, and is suitable for expression in both in vitro systems andbacterial systems.

The nucleotide sequence (and deduced amino-acid sequence) between theHindIII and EcoRI sites of plasmids pRAS107 and pRAS111 are given inFIG. 4.

                  TABLE 3                                                         ______________________________________                                        Plasmids used                                                                            Relevant                                                           Plasmid    characteristics                                                                              Source or reference                                 ______________________________________                                        pSWsFvD1.3myc                                                                            Anti-lysozyme ScFv,                                                                          McCafferty et al (1990)                                        VH.sub.D1.3 (G.sub.4 S).sub.3 Vk.sub.D1.3                                                    loc. cit.                                           pRAS103    Anti-lysozyme ScFv-                                                                          Spooner et al (1992) pp                                        ricin A chain fusion,                                                                        7-15 in Monoclonal                                             lac promoter   Antibodies 2;                                                                 Applications in Clinical                                                      Oncology (Epenetos,                                                           A.A., Ed), Chapman &                                                          Hall                                                pRAS106    VH.sub.D1.3 (G.sub.4 S).sub.3 VHD J558L,                                                     This application                                               SP6 promoter                                                       pRAS49     Anti-NP antibody H                                                                           Spooner and Lord                                               chain-ricin A chain                                                                          (1991) loc. cit.                                               fusion, IgH promoter                                               pRAS107    VH.sub.NP (G.sub.4 S).sub.3 VHD J558L,                                                       This application                                               SP6 promoter                                                       pRAS111    VH.sub.NP (G.sub.4 S).sub.3 VHD J558L,                                                       This application                                               T7 promoter                                                        ______________________________________                                    

Growth of Plasmacytoma J558L and DNA Preparation

Mouse plasmacytoma J558L cells were grown in Dulbecco's modified Eagle'smedium supplemented with 10% fetal calf serum. Cells were washed twicein standard phosphate-buffered saline pH 7.4 (PBS) and high molecularweight DNA was prepared by addition, with gentle vortexing, of 10⁶ cellssuspended in 100 μl PBS to 2.5 ml 10 mM Tris-HCl 1 mM EDTA pH 8.0containing 0.02% (w/v) SDS. After adding Proteinase K to 1 mg.ml⁻¹,incubation (3h, 50° C.) and two phenol/chloroform extractions, DNA wasprecipitated with ethanol, and dissolved overnight at 4° C. in 1 ml 10mM Tris-HCl 1 mM EDTA pH 8.

Polymerase Chain Reaction

Plasmid or chromosomal DNA (100 ng) was subjected to 24 rounds ofPCR-mediated amplification (94° C., 1 min; 65° C., 1.5 min; and 72° C.,2 min) in 50 μl reaction volumes containing 25 pmol of each appropriateoligonucleotide primer, 250 μM of each dNTP, 67 mM Tris-HCl (pH 8.8), 17mM (NH₄)₂ SO₄, 1.5-6 mM MgCl₂, 200 mg.ml⁻¹ gelatin and 5 units of Taqpolymerase (Cetus) overlaid with 25 μl paraffin oil. Amplified DNA wasextracted once with phenol/chloroform and precipitated with ethanolbefore use.

Bacterial Expression of pRAS111 Protein

E. coli K12 JM109(DE3), a JM109 derivative with a chromosomal insertionof T7 polymerase under lac transcriptional control, was transformed withplasmid pRAS111. Cells were grown to a density of 10⁷ ml⁻¹ andexpression of pRAS111 protein was induced by induction of T7 polmerasewith 100 nM IPTG. A 31 kDa protein accumulates in the cells insufficient quantity for provisional identification by Coomassie stainingof cell extracts. The identity is confirmed by Western Blotting, probingwith biotinylated goat anti-mouse lambda (Gαmλ) antiserum.

In addition, E. coli K12 BL21 (DE3), a derivative of BL21 with a singlechromosomal copy of T7 RNA polymerase under lacUV5 promoter control(Studier and Moffatt (1986) J. Mol. Biol. 189, 113-130) was transformedwith plasmid pRAS111. Cultures (400 ml) were grown at 37° C. or at roomtemperature in minimal salts medium supplemented with 100 μg.ml⁻¹ampicillin and 1% glucose or in L-broth supplemented with 100 μg.ml⁻¹ampicillin, to a density of 10⁷ cells.ml⁻¹. Expression of pRAS111 ScFvprotein was achieved by induction of T7 polymerase with 100 nM IPTG.After induction, cells were grown for 24 h to permit accumulation ofpRAS111 ScFv protein in the growth medium.

Biological Activity and Affinity Purification of pRAS111 Protein

Filtered bacterial supernatants were applied to wells of a 96-well platepreviously coated with 10 mg.ml⁻¹ NIP₁₅ -BSA or 300 mg.ml⁻¹ hen egglysozyme, and bound protein was detected by serial incubation withbiotinylated Gαmλ antiserum and HRPO-streptavidin conjugate. Colourchanges were generated by incubation with ABTS and were monitored at 405nm.

A soluble protein present in the growth medium of JM109 (DE3)/pRAS111cultures, but not in cultures of JM109 (DE3), binds NIP₁₅ -BSA, but notlysozyme (FIG. 5). Filtered bacterial growth medium recovered afterinduction of pRAS111 protein was applied to wells of an ELISA platecoated with 10 μg.ml⁻¹ NIP₁₅ -BSA (⊙) or 300 μg.ml⁻¹ hen egg lysozyme(♦). Bound protein was detected by serial incubation with biotinylatedGαmλ (Goat anti-mouse lambda light chain) antisera and horseradishperoxidase conjugated streptavidin diluted in blocking buffer, andcolour changes generated by addition of ABTS were monitored at 405 nm. Asoluble protein present in the growth medium of JM109(DE3)/pRAS111cultures, but not in cultures of JM109(DE3), binds NIP₁₅ -BSA, can becompeted with NIP₁₅ -BSA (FIG. 6). ScFv protein was allowed to bindELISA wells coated with 10 μg.ml⁻¹ NIP₁₅ -BSA in the absence ofcompeting hapten, or in the presence of 0.010 μg.ml⁻¹ (⋄), 0.019 μg.ml⁻¹(×), 0.039 μg.ml⁻¹ (-), 0.078 μg.ml⁻¹ (+), 0.156 μg.ml⁻¹ (♦), 0.313μg.ml⁻¹ (≮), 0.625 μg.ml⁻¹ (Δ), 1.25 μg.ml⁻¹ (▪), 2.5 μg.ml⁻¹ (□), 5μg.ml⁻¹ () or 10 μg.ml⁻¹ (◯) competing hapten. Bound protein wasdetected by serial incubation with biotinylated Gαmλ antisera andhorseradish peroxidase conjugated streptavidin, and colour changesgenerated by addition of ABTS were monitored at 405 nm. The ScFv encodedby pRAS111 was found to have a binding affinity for NP of K_(d) =4×10⁻⁹M. Since bivalency of an antibody commonly provides an extra threeorders of magnitude of binding ability, then an avidity of at least10⁻¹² M would be predicted for bivalent molecules derived from ScFvNP.

As an alternative, growth medium, filtered through 0.2 μm nitrocellulosefilters to remove cells and particulates, was adjusted to 80% saturationwith solid ammonium sulphate at 4° C. After incubation (4° C., 1 h)treated medium was centrifuged (10,000×g, 30 min) to pellet insolubleproteins. Pellets were taken up in 20 ml PBS and were dialysedexhaustively against PBS at 4° C. Insoluble material after dialysis wasremoved by brief centrifugation and the remainder was adjusted to 40 mlfinal volume with PBS, to 0.02% with sodium azide and was applied slowly(2 ml h⁻¹) to a 2 ml NP-Sepharose column at room temperature. Afterwashing with 50 column volumes of PBS containing 0.02% sodium azide(PBS/azide), bound proteins were eluted with 50 mM glycine-HCl pH 2.2and fractions (2 ml) were immediately adjusted by addition of 200 μl 2Munbuffered Tris base. Fractions containing ScFv protein were pooled,dialysed against PBS and concentrated using Macrosep (Amicon)concentrators with a 10 kDa cut-off. Yields were estimated by Bio-Radprotein assay, using rabbit IgG as a reference, and by absorbance at 280nm assuming A₂₈₀ =1 for 1.4 mg.ml⁻¹ solution.

Soluble NIP-binding activity was detected by ELISA analysis of bacterialgrowth medium after induction, and could be concentrated by ammoniumsulphate precipitation and purified by affinity chromatography onNP-Sepharose (FIG. 19) so no attempt was made to recover pRAS111 ScFvprotein present in cell pellets. Yields of pRAS111 ScFv from growthmedium were not greatly different when induced at room temperature or37° C. Induction of expression was efficient in minimal salts medium andnot discernible in rich broth; however, little difference was noted inthe final yields. The most important factor found here was the bacterialstrain, with yields of ˜1.3 mg l⁻¹ pRAS111 ScFv protein recovered fromcultures of BL21(DE3)/pRAS111, approximately ten-fold greater than thoseobtained from cultures of JM109(DE3)/pRAS111.

Specificity of pRAS111 Protein

The screening agent used here, biotinylated Gαmλ antiserum, also detectsthe λ1 light chain of anti-NP/NIP antibodies. It is therefore notpossible to demonstrate specificity of pRAS111 ScFv protein for NP orNIP by competition with anti-NP/NIP antibodies, but only by its abilityto recognize NP/NIP. A soluble protein present in growth medium of JM109(DE3)/pRAS111 cultures, but not in untransformed cultures of JM109(DE3), binds NIP₁₆ -BSA, but not lysozyme, and can be competed withNIP₁₆ -BSA (FIG. 5). This activity can be retained on NP-Sepharosecolumns, from where it can be eluted. In addition, targeting studiesdemonstrate no cross-reaction with BSA, PEPY2-BSA, antibody or mammaliancells.

Affinity Determinations

Results of affinity determinations using ELISA-based techniques aregiven in Table 4. Affinity of pRAS111 ScFv for NIP was estimated firstlyby adapting the method of Mariani et al (1987) Molec. Immunol. 24,297-303), determining the concentration of total added antibody givinghalf-maximal binding (C_(t50)) assuming C_(t50) =1/K_(app), whereK_(app) is the apparent affinity constant. This approximation only holdstrue if the number of available binding sites per well is sufficientlylow that their contribution is insignificant. Determinations of K_(app)should approach K_(actual) as the amount of antigen per well is reduced.Table 4 shows that a point is reached where similar values of K_(app)are derived (K=2-3×10⁸ M⁻¹), representing the closest approximation thatcan be made using this method.

To confirm the accuracy of this approach, similar estimations of K weremade using the method of Hogg et al (1987) Molec. Immunol. 24, 797-801)in the absence of competing antigen, by calculating the slope of thelinear portion of a plot of A₄₅₀ /[ScFv_(NP) ] v A₄₅₀, where A₄₅₀/[ScFv_(NP) ]=fKn-fK(A₄₅₀), A₄₅₀ is the absorbance at 450 nm, [ScFv_(NP)] is the concentration of added ScFv_(NP), n is the concentration ofavailable binding sites and f is the valency of the ScFv_(NP) for NIP. Avalue of 1 was assigned to f.

                  TABLE 4                                                         ______________________________________                                        Affinity determinations of pRAS111 ScFv.sub.NP protein                                             K (M.sup.-1)                                             Antigen                                                                              concn (mg ml.sup.-1)                                                                        Mariani et al (1987), Hogg et al                         coat   of coating buffer                                                                           (1987)                                                   ______________________________________                                        NIP.sub.16 BSA                                                                       5             2.5 (± 0.1) × 10.sup.9                                                           1.6 (± 0.1) × 10.sup.9              NIP.sub.16 BSA                                                                       1             8.2 (± 1.5) × 10.sup.8                                                           8.1 (± 0.7) × 10.sup.8              NIP.sub.4 BSA                                                                        10            2.9 (± 0.3) × 10.sup.8                                                           1.2 (± 0.2) × 10.sup.8              NIP.sub.4 BSA                                                                        5             2.5 (± 0.5) × 10.sup.8                                                           1.8 (± 0.1) × 10.sup.8              ______________________________________                                    

Preparation of NP-Sepharose

Sepharose support (20 ml) with an amine function (Affigel 102, Biorad)was washed and suspended by addition of 20 ml 40 mM triethylamine. Tothis was added 430 mg NP-cap-OSu (Cambridge Research Biochemicals)dissolved in 1 ml dimethylformamide (DMF). After mixing by gentleinversion (2h, room temperature) and extensive washing in water and thenPBS, NP-Sepharose was equilibrated in PBS/azide and stored in the darkat 4° C.

Western Blots to Identify pRAS111 Protein

Western blots were performed as previously described (Spooner and Lord(1991) pp 65-77 in Monoclonal Antibodies; Applications in ClinicalOncology (Epenetos, A. A., Ed) Chapman and Hall) and pRAS111 ScFvprotein was identified by serial incubations in PBS/5% milk powder/0.1%Tween 20 (blocking solution), biotinylated Gαmλ antisera andstreptavidin-HRPO diluted in blocking solution to concentrationsrecommended by the suppliers. After each incubation, blots were washed 5times in PBS/0.1% Tween 20. Proteins bound by biotinylated Gαmλ antiseraand steptavidin-HRPO were revealed by incubation with DAB.

EXAMPLE 2 Derivatisation of Proteins with Hapten

NIP-cap-OSu (Cambridge Research Biochemicals) was dissolved indimethylformamide to 20 mg.ml⁻¹ and added to proteins as below.

NIP-BSA: for low coupling ratio, 80 μl 20 mg.ml⁻¹ NIP-cap-OSu/DMF wasadded to 1 ml 200 mg.ml⁻¹ BSA in 10 mM triethylamine. For high couplingratio, 800 μl 20 mg.ml⁻¹ NIP-cap-OSu/DMF was added to 1 ml 200 mg.ml⁻¹BSA in 100 mM triethylamine.

NIP-antibody: 200 μl 20 mg.ml⁻¹ NIP-cap-OSu/DMF was added to 2 ml 2.8mg.ml⁻¹ antibody (AUA1 or HMFG1, Unipath) in PBS/40 mM triethylamine.

After mixing by inversion (2 h, room temperature) and extensive dialysisagainst PBS, insoluble material was removed by centrifugation. SolubleNIP-BSA was adjusted to 0.02% with sodium azide. Soluble NIP-antibodywas sterilised by filtration (0.2 μm filter). Haptenated proteins werestored at 4° C. in the dark. Protein concentration was estimated byBio-Rad protein assay.

The number of haptens conjugated to each protein molecule was estimatedby absorbance at 430 nm according to Brownstone et al (1966) Immunology10, 465-479: low coupling ratio NIP-BSA, 3.7 (NIP₄ -BSA); high couplingratio NIP-BSA, 16.4 (NIP₁₆ -BSA); NIP-AUA1, 38.3 (NIP₃₈ -AUA1) andNIP-HMFG1, 35.4 (NIP₃₅ -HMFG1).

EXAMPLE 3 Indirect Targeting Using pRAS111 ScFv Protein

The measured affinity of ScFv_(NP) or pRAS111 protein is sufficientlyhigh to contemplate cell targeting by a two-step approach. Cells (LOVOand HT29) and peptide (PEPY-BSA) were incubated with AUA1, NIP₃₈ -AUA1,or NIP₃₅ -HMFG1, and bound material was detected by incubation withsheep anti mouse antisera (Shαm) conjugated to HRPO or by serialincubation with biotinylated Gαmλ and streptavidin-HRPO (FIG. 20). LOVOcells, which express AUA1 antigen, can be identified by serialincubation with specific antibody (AUA1) and with Shαm conjugated toHRPO. Hapten-derivatised NIP₃₈ -AUA1 displayed a marked reduction incell-binding ability, with loss of approximately 90% ofimmunoreactivity. Hapten-conjugated NIP₃₅ -HMFG1 also bound LOVO cells,reflecting the ability of HMFG1 to bind these cells when presented athigh concentration. When pRAS111 ScFv_(NP) protein was used as adetection layer, hapten-derivatised NIP₃₈ -AUA1 and NIP₃₅ -HMFG1 wereboth recognised, but non-hapten-conjugated AUA1 was not. Similar resultswere obtained with a different cell line, HT29, that also expresses AUA1antigen.

When the specificity of the system was altered completely, a peptide(PEPY2) derived from the protein backbone of polymorphic epithelialmucin identified with NIP₃₅ -HMFG1 antibody was bound by pRAS111 ScFvprotein whilst those incubated with AUA1 and NIP₃₈ -AUA1 were not.

The specificity of pRAS111 ScFv_(NP) protein is therefore dependent uponprior targeting with a hapten-derivatised primary targeting vehicle, andthe specificity of targeting depends only upon the interaction ofprimary hapten-conjugated targeting vehicle and the interaction ofsecond step ScFv with the primary targeting vehicle.

For ELISAs using fixed mammalian cells, cells were seeded into wells of96-well microculture plates at 10⁵ cells.ml⁻¹ in RPMI supplemented with10% fetal calf serum and were grown to confluence at 37° C. in a 5% CO₂atmosphere. Cells were washed twice in PBS, were incubated in 0.25%glutaraldehyde in PBS (100 μl per well, room temperature, 15 min) andafter a further wash in PBS, were stored at 4° C. in PBS/azide.

Unbound sites were blocked (30 min, room temperature) using 1% milkpowder reconstituted in PBS containing 0.1% Tween 20 (blocking buffer).Antibodies and hapten-conjugated antibodies were applied and weredetected by serial incubation with pRAS111 ScFv protein, biotinylatedGαmλ antisera and streptavidin-HRPO or by incubation with horseradishperoxidase conjugated Sheep anti mouse serum, diluted in blocking bufferto appropriate concentrations. After each incubation, plates were washed5 times in PBS containing 0.1% Tween 20. Colour changes were generatedusing ABTS (monitored at 405 nm) or OPD (monitored at 450 nm).

The results of indirect targeting of pRAS111 ScFv_(NP) are shown in FIG.20.

Binding of AUA1 (open circles), NIP₃₈ -AUA1 (closed circles) and NIP₃₅-HMFG1 (open triangles) to LOVO cells, HT29 cells and to a peptidederived from the mucin backbone conjugated to BSA (PEPY2-BSA). Boundprimary antibody was detected using HRPO-conjugated sheep anti-mouseantisera (Shαm) or by recognition using pRAS111 ScFv_(NP) (ScFv).

EXAMPLE 4 Construction of High Avidity ScFv-streptavidin Fusion

Plasmid Constructions

Plasmids for the in vitro expression of ScFv-streptavidin fusions areshown in FIG. 7. Filled circles represent promoters: P_(SP6), SP6promoter; P_(T7), T7 promoter. Open boxes represent fused gene portions:pelB, the signal sequence derived from the pectate lyase B gene ofErwinia caratovora; (G₄ S)₃, flexible oligopeptide linker comprisingthree tandem repeats of N-GlyGlyGlyGlySer-C; A-P, a novel flexibleoligopeptide linker.

Restriction enzyme sites: B, BamHI; Bs, BstEII; b, BglII; C, ClaI; E,EcoRI; H, HindIII; K, KpnI; P, PstI; Sp, SphI; Ss, SstI; X, XhoI; Xb,XbaI.

Plasmids for the expression of ScFv-streptavidin fusions in E. coli areshown in FIG. 8. Filled circles represent promoters: P_(SP6), SP6promoter; P_(T7), T7 promoter. Open boxes represent fused gene portions:pelB, the signal sequence derived from the pectate lyase B gene ofErwinia caratovora; (G₄ S)₃, flexible oligopeptide linker comprisingthree tandem repeats of N-GlyGlyGlyGlySer-C; A-P, a novel flexibleoligopeptide linker Restriction enzyme sites: B, BamHI; Bs, BstEII; b,BglII; C, ClaI; E, EcoRI; H, HindIII; K, KpnI; P, PstI; Sp, SphI; Ss,SstI: X, XhoI; Xb, XbaI.

Segments of DNA encoding mature streptavidin monomers or fragments wereamplified by PCR and were used to replace the XhoI-EcoRI fragment ofplasmid pRAS107 to generate plasmids pRAS108, pRAS109 and pRAS110, whichencode ScFv_(NP) -streptavidin fusions under SP6 transcriptionalcontrol.

Plasmid pRAS108 encodes a ScFv_(NP) fused via a novel oligopeptide(APAAAPA (SEQ ID No 23)). Its product is expected to tetramerise via thestreptavidin monomer moieties. Mature streptavidin often forms higherorder complexes, probably through interaction of the amino-terminal andcarboxy-terminal regions which are thought to be flexible extensions.Many commercial preparations lack these, through natural proteolysis,and form stable tetramers. In order to mimic this, two further ScFv_(NP)-streptavidin derivatives were made, one borne on plasmid pRAS109 andwhich lacks the 19 carboxy terminal amino acids of streptavidin, and theother, on plasmid pRAS110, which further lacks the 12 amino-terminalamino acids of streptavidin. Plasmid pRAS110 thus encodes a ScFv_(NP)linked to "core" streptavidin monomers, typical of many commercialpreparations.

Plasmids pRAS112, pRAS113 and pRAS114 are derived from plasmids pRAS108,pRAS109 and pRAS110 respectively, and code for ScFv_(NP) -streptavidinfusions under the transcriptional control of the T7 promoter.

The nucleotide sequence (and deduced amino-acid sequence) between theHindIII and EcoRI sites of plasmids pRAS108 and pRAS112 are given inFIG. 9, the sequences of plasmids pRAS109 and pRAS113 in FIG. 10 andthose of plasmid pRAS110 and pRAS114 are displayed in FIG. 11.

Bacterial Expression of pRAS112, pRAS113 and pRAS114 Proteins

In contrast to ScFv_(NP), in the conditions used, proteins encoded byplasmids pRAS112, pRAS113 and pRAS114 do not accumulate after inductionin amounts sufficient for provisional identification by Coomassiestaining. Western Blotting of cell extracts and culture supernatants,probing with biotinylated Gαmλ antiserum and HRPO-streptavidin conjugateor rabbit α-streptavidin (RαS) antiserum and HRPO-donkey α-rabbit (DαR)antiserum allows identification of fusion proteins of expected monomericsizes. Very little ScFv_(NP) -core streptavidin accumulates afterinduction of expression of pRAS114 protein.

In non-reducing conditions, almost all of the ScFv-streptavidin materialmigrates with sizes corresponding to multimeric forms (at ˜90 kDa for adimer and 180 kDa for the tetramer). Note that in the conditionsemployed here, streptavidin itself exists mostly as higher orderaggregates.

Antigen Binding

Filtered bacterial supernatants were applied to wells of a 96-well platepreviously coated with 10 μg.ml⁻¹ NIP₁₅ -BSA or 300 μg.ml⁻¹ hen egglysozyme, and bound protein was detected by serial incubation withbiotinylated Gαmλ antiserum and HRPO-streptavidin conjugate or RαSantiserum and HRPO-DαR antiserum. Colour changes were generated byincubation with ABTS and were monitored at 405 nm.

Only soluble pRAS112 protein (full length ScFv_(NP) -streptavidinmonomer) can be detected in bacterial supernatants (FIG. 12). Filteredbacterial growth medium recovered after induction of pRAS112 (⊙),pRAS113 (♦) or pRAS114 (▪) protein was diluted in PBS and applied towells of an ELISA plate coated with 10 μg.ml⁻¹ NIP₁₅ -BSA. Bound proteinwas detected by serial incubation with Rabbit α Streptavidin antiseraand horseradish peroxidase conjugated Donkey α Rabbit antisera, andcolour changes generated by addition of ABTS were monitored at 405 nm.Like the parental ScFv_(NP), this protein binds NIP₁₅ -BSA, but notlysozyme (FIG. 13). Filtered bacterial growth medium recovered afterinduction of pRAS112 protein was applied to wells of an ELISA platecoated with 10 μg.ml⁻¹ NIP₁₅ -BSA (⊙) or 300 μg.ml⁻¹ hen egg lysozyme(♦). Bound protein was detected by serial incubation with Rabbit αStreptavidin antisera and horseradish peroxidase conjugated Donkey αRabbit antisera, and colour changes generated by addition of ABTS weremonitored at 405 nm.

Partial Purification of pRAS112 Protein

ScFv_(NP) -streptavidin fusion protein (pRAS112 protein) can beconcentrated about 20-fold by precipitation from 50% saturated ammoniumsulphate and dialysis against PBS. As expected concentrated pRAS112protein binds iminobiotin-Sepharose at pH11, in contrast to parentalScFv_(NP) protein (FIG. 14). Concentrated proteins resolubilised in PBSafter precipitation from 50% (pRAS112) or 80% (pRAS111) saturatedammonium sulphate were applied at pH11 to a iminobiotin-Sepharose column(Pierce), and antigen binding ability of material applied to the column(⊙) and material flowing through the column (♦) were measured byappropriate ELISA. a) pRAS112 protein, b) pRAS111 protein.

The avidity of streptavidin fusions can be compared with univalentScFvs.

a) The slope of a NIP-specific ELISA performed using pRAS112streptavidin fusion differs from that performed using pRAS111 scFv.

b) pRAS112 protein binding to NIP-BSA cannot be competed with free NP,free NIP or NIP-BSA, whereas pRAS111 scFv can.

c) pRAS112 protein cannot be eluted in a single pulse from aNP-Sepharose column. Multiple pulses of low pH interspersed with high pHwashes are required to elute this protein. In contrast, pRAS111 scFvelutes with a single low pH step (FIG. 34).

This indicates that the streptavidin fusions (pRAS112) are bindingmultivalently.

A representation of the pRAS112 protein is shown in FIG. 15.

EXAMPLE 5 Construction of ScFv-BSRNase Fusion Molecules

Plasmid Construction

Plasmids for the expression of ScFv-BSRNase fusions are shown in FIG.16. The plasmid pRAS111 is described in Example 1, and the plasmid pBSV5is as described in Schein et al, loc. cit.

FIG. 21 shows the sequence of the ScFv-BSRNase fusion (4-OHnitrophenacetyl antibody) inserted between the HindIII and EcoRI sitesof plasmid pSP7 (available from Promega) to give plasmid pSPNPBSR asshown in FIG. 16.

FIG. 22 shows the sequence of the ScFv-BSRNase fusion (anti-humanplacental alkaline phosphatase antibody; H17E2) inserted between theHindIII and EcoRI sites of plasmid pSP71 to give plasmid pSPH17ΔXBSR asshown in FIG. 16.

The amino acid sequences of the V_(H) and V_(L) chains of H17E2 aredisclosed in "Monoclonal Antibodies--applications in clinical oncology",pages 37-43, 1991, A. A. Epenetos, ed., Chapman & Hall, UK.

FIG. 23 shows the sequence of the ScFv-BSRNase fusion (anti-lysozymeantibody) inserted between the HindIII and EcoRI sites of plasmid pUC18(available from Pharmacia) to give pUCD1.3BSR as shown in FIG. 16.

FIG. 17 shows a diagrammatic representation of the specific case where aheterodimer has been synthesised and purified (as described supra), inthis case each of the ScFvs recognizes a different antigen on the sametumour cell.

The plasmids were made using standard methods of molecular biology asdisclosed by Sambrook et al (1989) in Molecular Cloning, a laboratorymanual, 2nd Edn, Cold Spring Harbor Laboratory Press, NY, USA.

The plasmid pSPNPBSR encodes a protein which directs cytotoxin RNase toa target cell-specific molecule derivatised with NP or NIP. The plasmidpPSH17ΔXBSR encodes a protein which directs RNase to cells expressingthe human placental alkaline phosphatase antigen. The ScFv encoded bythis plasmid is derived from the monoclonal antibody H17E2 (see Table1).

In addition to the fusion gene consisting of the H17E2 scFv and seminalRNase only (see above) the following fusion genes which incorporate oneor more of the following are useful:

(i) A C-terminal "KDEL" sequence (endoplasmic reticulum retentionsignal), which may elevate cytotoxicity by increasing the retention ofthe protein in the cell and reducing its loss to other endosomalpathways.

(ii) A linker sequence at the N-terminus of the RNase to allow theN-terminus to be more flexible and increase the likelihood of formingdimers.

(iii) A disulphide loop containing sequence, derived from the diptheriatoxin, which allows the scFv and RNase to be linked via a disulphidebond, and permits efficient release of the RNase from the scFv once thecytotoxin has been internalised.

The plasmids which contain these genes (described diagrammatically inFIG. 28 and individual nucleotide sequences encoding these proteinsgiven in FIGS. 29 to 33) are identical to that expressing the originalscFv-RNase fusion protein, i.e. only the DNA sequence of the actualcytotoxic molecule has been altered. The conditions for expression andrefolding are as described in the earlier Examples.

Characterisation of the scFv-RNase Protein

RNase Activity of the Fusion Proteins.

All the fusions described, H17-BSRNase, H17-DT-BSRNase,H17-DT-BSRNaseKDEL,H17-DT-Link-BSRNase,H17-DT-Link-BSRNaseKDEL,H17-BSRNaseKDEL, H17-Link-BSRNaseKDEL, have RNA-degrading activity, asdemonstrated by an RNase assay which involves incubating a sample of therefolded protein (10-50 ng of crude fusion protein) with 5 μg of RNA ina volume of 20 μl at 37° C. for 1 hr. In each case all the RNA wasdegraded, showing qualitative RNase activity in the fusion proteinpreparations.

Antigen-binding Activity of Fusion Proteins.

All the fusion proteins demonstrate binding to the antigen humanplacental alkaline phosphatase (hPLAP) in an ELISA system. The detectinglayers for the ELISA were anti-bovine seminal RNase polyclonalantibodies (from rabbit) and anti-rabbit polyclonal antibodies (fromgoat).

Evidence for the Dimeric Nature of the scFv-RNase.

Gel filtration experiments show the native molecular weight of thefusion proteins. Data from binding experiments indicates that themolecule has a higher avidity than the single-chain H17E2 antibodyalone: the scFv will bind to an antigen affinity column (the antigen isplacental alkaline phosphatase) and is eluted with a buffer consistingof 50 mM diethylamine (DEA), pH 12. The fusion protein, due to itshigher avidity cannot be eluted under these mild conditions, and moreharsh conditions are needed, eg 100 mM glycine, pH 2.2. Also, when thescFv and whole IgG H17E2 and fusion proteins are bound to their antigenon an ELISA plate and washed with copious amounts of 50 mM DEA, over 90%of the scFv is washed off, whereas only 40% of the whole IgG and fusionprotein is washed off. Finally, the shape of the ELISA curve for thewhole IgG H17EE2 and fusion protein are similar (shallow slope), butthat of the scFv is a steep slope. These experiments indicate that thescFv-RNase protein is dimeric.

Cytotoxicity of the Fusion Proteins Towards an Antigen-positiveCell-Line (HEp2).

HEp2 cells were seeded in 96-well microtitre plates and grown overnightin E4 culture media with 10 fetal calf serum (FCS) at a density of 10⁵cells per well. The next day, 10 μl of crude refolded fusion protein inPBS was added to each well and allowed to grow for 72 hr. Cell-killingwas detected using the Promega cell-titre 96 assay kit, which measurescell proliferation.

The scFv-BSRNase fusion protein consisting of a disulphide loop, KDELand linker showed significant cell killing activity. The estimated finalconcentration of the cytotoxin was between 10-100 nM (see FIG. 27 forthe results of these experiments).

EXAMPLE 6 Construction of ScFv-DNAseI Fusion Molecules Without a NuclearLocalization Signal

Plasmids for the expression of ScFv-DNAseI fusions are shown in FIG. 18.The plasmid pRAS111 is described in Example 1 and M13mp19DNAseRec5 isdescribed in Worrall and Connolly, loc. cit.

FIG. 24 shows the sequence of the ScFv-DNaseI fusion (4-OHnitrophenacetyl antibody) inserted between the HindIII and BglII sitesof plasmid pSP71 to give plasmid pSPNPDN1 as shown in FIG. 18.

FIG. 25 shows the sequence of the ScFv-DNaseI fusion (anti-humanplacental alkaline phosphatase antibody; H17E2) inserted between theHindIII and BglII sites of plasmid pSP71 to give plasmid pSPH17ΔXDN1 asshown in FIG. 18.

FIG. 26 shows the sequence of the ScFv-DNase fusion (anti-lysozymeantibody) inserted between the HindIII and BglII sites of plasmid pUC 18to give pUCD1.3DN1 as shown in FIG. 18.

The plasmids were made using standard methods of molecular biology asdisclosed in Sambrook et al (1989) in Molecular Cloning, a laboratorymanual, 2nd Edn, Cold Spring Harbor Laboratory Press, NY, USA.

The plasmid pSPNPDN1 encodes a protein which directs DNaseI to a targetcell-specific molecule derivatised with NP or NIP.

The plasmid pSPH17ΔXDN1 encodes a protein which directs DNaseI to cellsexpressing the human placental alkaline phosphatase antigen. The ScFvencoded by this plasmid is derived from the monoclonal antibody H17E2(see Table 1).

The scFv-DNase I fusion has been expressed under identical conditions tothat of the RNase fusions and refolded. The crude refolded preparationof the scFv-DNase I fusion protein shows PLAP-antigen binding activityin an ELISA system similar to the parent antibody H17E2. The detectinglayers are anti-bovine DNase I (from rabbit) and anti-rabbit (fromgoat). The DNase I fusion protein also demonstrates DNA-degradingactivity, in a similar system as that of the RNase assay, except 2 μg ofDNA is incubated. The activity is only present when 10 mM CaCl₂ and 4 mMMgCl₂ is added, as is found with the naturally occurring bovine DNase I.suggesting that functional scFv-DNase fusion molecules have beenexpressed and refolded from E. coli.

    __________________________________________________________________________    #             SEQUENCE LISTING                                                - (1) GENERAL INFORMATION:                                                    -    (iii) NUMBER OF SEQUENCES: 29                                            - (2) INFORMATION FOR SEQ ID NO: 1:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 7 amino                                                           (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: peptide                                             -    (iii) HYPOTHETICAL: NO                                                   #1:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Pro Lys Lys Lys Arg Lys Val                                                 1               5                                                             - (2) INFORMATION FOR SEQ ID NO: 2:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 858 base                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   -     (ix) FEATURE:                                                                     (A) NAME/KEY: CDS                                                             (B) LOCATION: 40..846                                               #2:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATA ATG AAA TAC - # CTA TTG            54                                                                          #       Met Lys Tyr Leu Leu                                                   #      5  1                                                                   - CCT ACG GCA GCC GCT GGA TTG TTA TTA CTC GC - #T GCC CAA CCA GCG ATG          102                                                                          Pro Thr Ala Ala Ala Gly Leu Leu Leu Leu Al - #a Ala Gln Pro Ala Met           #                 20                                                          - GCC CAG GTG CAG CTG CAG CAG CCT GGG GCT GA - #G CTT GTG AAG CCT GGG          150                                                                          Ala Gln Val Gln Leu Gln Gln Pro Gly Ala Gl - #u Leu Val Lys Pro Gly           #             35                                                              - GCT TCA GTG AAG CTG TCC TGC AAG GCT TCT GG - #C TAC ACC TTC ACC AGC          198                                                                          Ala Ser Val Lys Leu Ser Cys Lys Ala Ser Gl - #y Tyr Thr Phe Thr Ser           #         50                                                                  - TAC TGG ATG CAC TGG GTG AAG CAG AGG CCT GG - #A CGA GGC CTT GAG TGG          246                                                                          Tyr Trp Met His Trp Val Lys Gln Arg Pro Gl - #y Arg Gly Leu Glu Trp           #     65                                                                      - ATT GGA AGG ATT GAT CCT AAT AGT GGT GGT AC - #T AAG TAC AAT GAG AAG          294                                                                          Ile Gly Arg Ile Asp Pro Asn Ser Gly Gly Th - #r Lys Tyr Asn Glu Lys           # 85                                                                          - TTC AAG AGC AAG GCC ACA CTG ACT GTA GAC AA - #A CCC TCC AGC ACA GCC          342                                                                          Phe Lys Ser Lys Ala Thr Leu Thr Val Asp Ly - #s Pro Ser Ser Thr Ala           #                100                                                          - TAC ATG CAG CTC AGC AGC CTG ACA TCT GAG GA - #C TCT GCG GTC TAT TAT          390                                                                          Tyr Met Gln Leu Ser Ser Leu Thr Ser Glu As - #p Ser Ala Val Tyr Tyr           #           115                                                               - TGT GCA AGA TAC GAT TAC TAC GGT AGT AGC TA - #C TTT GAC TAC TGG GGC          438                                                                          Cys Ala Arg Tyr Asp Tyr Tyr Gly Ser Ser Ty - #r Phe Asp Tyr Trp Gly           #       130                                                                   - CAA GGG ACC ACG GTC ACC GTC TCC TCA GGT GG - #A GGC GGT TCA GGC GGA          486                                                                          Gln Gly Thr Thr Val Thr Val Ser Ser Gly Gl - #y Gly Gly Ser Gly Gly           #   145                                                                       - GGT GGC TCT GGC GGT GGC GGA TCC CAG GCT GT - #T GTG ACT CAG GAA TCT          534                                                                          Gly Gly Ser Gly Gly Gly Gly Ser Gln Ala Va - #l Val Thr Gln Glu Ser           150                 1 - #55                 1 - #60                 1 -       #65                                                                           - GCA CTC ACC ACA TCA CCT GGT GAA ACA GTC AC - #A CTC ACT TGT CGC TCA          582                                                                          Ala Leu Thr Thr Ser Pro Gly Glu Thr Val Th - #r Leu Thr Cys Arg Ser           #               180                                                           - AGT ACT GGG GCT GTT ACA ACT AGT AAC TAT GC - #C AAC TGG GTC CAA GAA          630                                                                          Ser Thr Gly Ala Val Thr Thr Ser Asn Tyr Al - #a Asn Trp Val Gln Glu           #           195                                                               - AAA CCA GAT CAT TTA TTC ACT GGT CTA ATA GG - #T GGT ACC AAC AAC CGA          678                                                                          Lys Pro Asp His Leu Phe Thr Gly Leu Ile Gl - #y Gly Thr Asn Asn Arg           #       210                                                                   - GCT CCA GGT GTT CCT GCC AGA TTC TCA GGC TC - #C CTG ATT GGA GAC AAG          726                                                                          Ala Pro Gly Val Pro Ala Arg Phe Ser Gly Se - #r Leu Ile Gly Asp Lys           #   225                                                                       - GCT GCC CTC ACC ATC ACA GGG GCA CAG ACT GA - #G GAT GAG GCA ATA TAT          774                                                                          Ala Ala Leu Thr Ile Thr Gly Ala Gln Thr Gl - #u Asp Glu Ala Ile Tyr           230                 2 - #35                 2 - #40                 2 -       #45                                                                           - TTC TGT GCT CTA TGG TAC AGC AAC CAC TGG GT - #G TTC GGT GGA GGA ACC          822                                                                          Phe Cys Ala Leu Trp Tyr Ser Asn His Trp Va - #l Phe Gly Gly Gly Thr           #               260                                                           #      858ACT GTC CTA GGT CTC GAG TAATAAGAAT TC - #                           Lys Leu Thr Val Leu Gly Leu Glu                                                           265                                                               - (2) INFORMATION FOR SEQ ID NO: 3:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 269 amino                                                         (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: protein                                             #3:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Met Lys Tyr Leu Leu Pro Thr Ala Ala Ala Gl - #y Leu Leu Leu Leu Ala         #                 15                                                          - Ala Gln Pro Ala Met Ala Gln Val Gln Leu Gl - #n Gln Pro Gly Ala Glu         #             30                                                              - Leu Val Lys Pro Gly Ala Ser Val Lys Leu Se - #r Cys Lys Ala Ser Gly         #         45                                                                  - Tyr Thr Phe Thr Ser Tyr Trp Met His Trp Va - #l Lys Gln Arg Pro Gly         #     60                                                                      - Arg Gly Leu Glu Trp Ile Gly Arg Ile Asp Pr - #o Asn Ser Gly Gly Thr         # 80                                                                          - Lys Tyr Asn Glu Lys Phe Lys Ser Lys Ala Th - #r Leu Thr Val Asp Lys         #                 95                                                          - Pro Ser Ser Thr Ala Tyr Met Gln Leu Ser Se - #r Leu Thr Ser Glu Asp         #           110                                                               - Ser Ala Val Tyr Tyr Cys Ala Arg Tyr Asp Ty - #r Tyr Gly Ser Ser Tyr         #       125                                                                   - Phe Asp Tyr Trp Gly Gln Gly Thr Thr Val Th - #r Val Ser Ser Gly Gly         #   140                                                                       - Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gl - #y Gly Ser Gln Ala Val         145                 1 - #50                 1 - #55                 1 -       #60                                                                           - Val Thr Gln Glu Ser Ala Leu Thr Thr Ser Pr - #o Gly Glu Thr Val Thr         #               175                                                           - Leu Thr Cys Arg Ser Ser Thr Gly Ala Val Th - #r Thr Ser Asn Tyr Ala         #           190                                                               - Asn Trp Val Gln Glu Lys Pro Asp His Leu Ph - #e Thr Gly Leu Ile Gly         #       205                                                                   - Gly Thr Asn Asn Arg Ala Pro Gly Val Pro Al - #a Arg Phe Ser Gly Ser         #   220                                                                       - Leu Ile Gly Asp Lys Ala Ala Leu Thr Ile Th - #r Gly Ala Gln Thr Glu         225                 2 - #30                 2 - #35                 2 -       #40                                                                           - Asp Glu Ala Ile Tyr Phe Cys Ala Leu Trp Ty - #r Ser Asn His Trp Val         #               255                                                           - Phe Gly Gly Gly Thr Lys Leu Thr Val Leu Gl - #y Leu Glu                     #           265                                                               - (2) INFORMATION FOR SEQ ID NO: 4:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1356 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   -     (ix) FEATURE:                                                                     (A) NAME/KEY: CDS                                                             (B) LOCATION: 40..1344                                              #4:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATA ATG AAA TAC - # CTA TTG            54                                                                          #       Met Lys Tyr Leu Leu                                                   #      5  1                                                                   - CCT ACG GCA GCC GCT GGA TTG TTA TTA CTC GC - #T GCC CAA CCA GCG ATG          102                                                                          Pro Thr Ala Ala Ala Gly Leu Leu Leu Leu Al - #a Ala Gln Pro Ala Met           #                 20                                                          - GCC CAG GTG CAG CTG CAG CAG CCT GGG GCT GA - #G CTT GTG AAG CCT GGG          150                                                                          Ala Gln Val Gln Leu Gln Gln Pro Gly Ala Gl - #u Leu Val Lys Pro Gly           #             35                                                              - GCT TCA GTG AAG CTG TCC TGC AAG GCT TCT GG - #C TAC ACC TTC ACC AGC          198                                                                          Ala Ser Val Lys Leu Ser Cys Lys Ala Ser Gl - #y Tyr Thr Phe Thr Ser           #         50                                                                  - TAC TGG ATG CAC TGG GTG AAG CAG AGG CCT GG - #A CGA GGC CTT GAG TGG          246                                                                          Tyr Trp Met His Trp Val Lys Gln Arg Pro Gl - #y Arg Gly Leu Glu Trp           #     65                                                                      - ATT GGA AGG ATT GAT CCT AAT AGT GGT GGT AC - #T AAG TAC AAT GAG AAG          294                                                                          Ile Gly Arg Ile Asp Pro Asn Ser Gly Gly Th - #r Lys Tyr Asn Glu Lys           # 85                                                                          - TTC AAG AGC AAG GCC ACA CTG ACT GTA GAC AA - #A CCC TCC AGC ACA GCC          342                                                                          Phe Lys Ser Lys Ala Thr Leu Thr Val Asp Ly - #s Pro Ser Ser Thr Ala           #                100                                                          - TAC ATG CAG CTC AGC AGC CTG ACA TCT GAG GA - #C TCT GCG GTC TAT TAT          390                                                                          Tyr Met Gln Leu Ser Ser Leu Thr Ser Glu As - #p Ser Ala Val Tyr Tyr           #           115                                                               - TGT GCA AGA TAC GAT TAC TAC GGT AGT AGC TA - #C TTT GAC TAC TGG GGC          438                                                                          Cys Ala Arg Tyr Asp Tyr Tyr Gly Ser Ser Ty - #r Phe Asp Tyr Trp Gly           #       130                                                                   - CAA GGG ACC ACG GTC ACC GTC TCC TCA GGT GG - #A GGC GGT TCA GGC GGA          486                                                                          Gln Gly Thr Thr Val Thr Val Ser Ser Gly Gl - #y Gly Gly Ser Gly Gly           #   145                                                                       - GGT GGC TCT GGC GGT GGC GGA TCC CAG GCT GT - #T GTG ACT CAG GAA TCT          534                                                                          Gly Gly Ser Gly Gly Gly Gly Ser Gln Ala Va - #l Val Thr Gln Glu Ser           150                 1 - #55                 1 - #60                 1 -       #65                                                                           - GCA CTC ACC ACA TCA CCT GGT GAA ACA GTC AC - #A CTC ACT TGT CGC TCA          582                                                                          Ala Leu Thr Thr Ser Pro Gly Glu Thr Val Th - #r Leu Thr Cys Arg Ser           #               180                                                           - AGT ACT GGG GCT GTT ACA ACT AGT AAC TAT GC - #C AAC TGG GTC CAA GAA          630                                                                          Ser Thr Gly Ala Val Thr Thr Ser Asn Tyr Al - #a Asn Trp Val Gln Glu           #           195                                                               - AAA CCA GAT CAT TTA TTC ACT GGT CTA ATA GG - #T GGT ACC AAC AAC CGA          678                                                                          Lys Pro Asp His Leu Phe Thr Gly Leu Ile Gl - #y Gly Thr Asn Asn Arg           #       210                                                                   - GCT CCA GGT GTT CCT GCC AGA TTC TCA GGC TC - #C CTG ATT GGA GAC AAG          726                                                                          Ala Pro Gly Val Pro Ala Arg Phe Ser Gly Se - #r Leu Ile Gly Asp Lys           #   225                                                                       - GCT GCC CTC ACC ATC ACA GGG GCA CAG ACT GA - #G GAT GAG GCA ATA TAT          774                                                                          Ala Ala Leu Thr Ile Thr Gly Ala Gln Thr Gl - #u Asp Glu Ala Ile Tyr           230                 2 - #35                 2 - #40                 2 -       #45                                                                           - TTC TGT GCT CTA TGG TAC AGC AAC CAC TGG GT - #G TTC GGT GGA GGA ACC          822                                                                          Phe Cys Ala Leu Trp Tyr Ser Asn His Trp Va - #l Phe Gly Gly Gly Thr           #               260                                                           - AAA CTG ACT GTC CTA GGT CTC GAG GCA CCT GC - #T GCC GCA CCT GCA GAC          870                                                                          Lys Leu Thr Val Leu Gly Leu Glu Ala Pro Al - #a Ala Ala Pro Ala Asp           #           275                                                               - CCG TCC AAG GAC TCC AAA GCT CAG GTT TCT GC - #A GCC GAA GCT GGT ATC          918                                                                          Pro Ser Lys Asp Ser Lys Ala Gln Val Ser Al - #a Ala Glu Ala Gly Ile           #       290                                                                   - ACT GGC ACC TGG TAT AAC CAA CTG GGG TCG AC - #T TTC ATT GTG ACC GCT          966                                                                          Thr Gly Thr Trp Tyr Asn Gln Leu Gly Ser Th - #r Phe Ile Val Thr Ala           #   305                                                                       - GGT GCG GAC GGA GCT CTG ACT GGC ACC TAC GA - #A TCT GCG GTT GGT AAC         1014                                                                          Gly Ala Asp Gly Ala Leu Thr Gly Thr Tyr Gl - #u Ser Ala Val Gly Asn           310                 3 - #15                 3 - #20                 3 -       #25                                                                           - GCA GAA TCC CGC TAC GTA CTG ACT GGC CGT TA - #T GAC TCT GCA CCT GCC         1062                                                                          Ala Glu Ser Arg Tyr Val Leu Thr Gly Arg Ty - #r Asp Ser Ala Pro Ala           #               340                                                           - ACC GAT GGC TCT GGT ACC GCT CTG GGC TGG AC - #T GTG GCT TGG AAA AAC         1110                                                                          Thr Asp Gly Ser Gly Thr Ala Leu Gly Trp Th - #r Val Ala Trp Lys Asn           #           355                                                               - AAC TAT CGT AAT GCG CAC AGC GCC ACT ACG TG - #G TCT GGC CAA TAC GTT         1158                                                                          Asn Tyr Arg Asn Ala His Ser Ala Thr Thr Tr - #p Ser Gly Gln Tyr Val           #       370                                                                   - GGC GGT GCT GAG GCT CGT ATC AAC ACT CAG TG - #G CTG TTA ACA TCC GGC         1206                                                                          Gly Gly Ala Glu Ala Arg Ile Asn Thr Gln Tr - #p Leu Leu Thr Ser Gly           #   385                                                                       - ACT ACC GAA GCG AAT GCA TGG AAA TCG ACA CT - #A GTA GGT CAT GAC ACC         1254                                                                          Thr Thr Glu Ala Asn Ala Trp Lys Ser Thr Le - #u Val Gly His Asp Thr           390                 3 - #95                 4 - #00                 4 -       #05                                                                           - TTT ACC AAA GTT AAG CCT TCT GCT GCT AGC AT - #T GAT GCT GCC AAG AAA         1302                                                                          Phe Thr Lys Val Lys Pro Ser Ala Ala Ser Il - #e Asp Ala Ala Lys Lys           #               420                                                           - GCA GGC GTA AAC AAC GGT AAC CCT CTA GAC GC - #T GTT CAG CAA                 #1344                                                                         Ala Gly Val Asn Asn Gly Asn Pro Leu Asp Al - #a Val Gln Gln                   #           435                                                               #     1356                                                                    - (2) INFORMATION FOR SEQ ID NO: 5:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 435 amino                                                         (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: protein                                             #5:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Met Lys Tyr Leu Leu Pro Thr Ala Ala Ala Gl - #y Leu Leu Leu Leu Ala         #                 15                                                          - Ala Gln Pro Ala Met Ala Gln Val Gln Leu Gl - #n Gln Pro Gly Ala Glu         #             30                                                              - Leu Val Lys Pro Gly Ala Ser Val Lys Leu Se - #r Cys Lys Ala Ser Gly         #         45                                                                  - Tyr Thr Phe Thr Ser Tyr Trp Met His Trp Va - #l Lys Gln Arg Pro Gly         #     60                                                                      - Arg Gly Leu Glu Trp Ile Gly Arg Ile Asp Pr - #o Asn Ser Gly Gly Thr         # 80                                                                          - Lys Tyr Asn Glu Lys Phe Lys Ser Lys Ala Th - #r Leu Thr Val Asp Lys         #                 95                                                          - Pro Ser Ser Thr Ala Tyr Met Gln Leu Ser Se - #r Leu Thr Ser Glu Asp         #           110                                                               - Ser Ala Val Tyr Tyr Cys Ala Arg Tyr Asp Ty - #r Tyr Gly Ser Ser Tyr         #       125                                                                   - Phe Asp Tyr Trp Gly Gln Gly Thr Thr Val Th - #r Val Ser Ser Gly Gly         #   140                                                                       - Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gl - #y Gly Ser Gln Ala Val         145                 1 - #50                 1 - #55                 1 -       #60                                                                           - Val Thr Gln Glu Ser Ala Leu Thr Thr Ser Pr - #o Gly Glu Thr Val Thr         #               175                                                           - Leu Thr Cys Arg Ser Ser Thr Gly Ala Val Th - #r Thr Ser Asn Tyr Ala         #           190                                                               - Asn Trp Val Gln Glu Lys Pro Asp His Leu Ph - #e Thr Gly Leu Ile Gly         #       205                                                                   - Gly Thr Asn Asn Arg Ala Pro Gly Val Pro Al - #a Arg Phe Ser Gly Ser         #   220                                                                       - Leu Ile Gly Asp Lys Ala Ala Leu Thr Ile Th - #r Gly Ala Gln Thr Glu         225                 2 - #30                 2 - #35                 2 -       #40                                                                           - Asp Glu Ala Ile Tyr Phe Cys Ala Leu Trp Ty - #r Ser Asn His Trp Val         #               255                                                           - Phe Gly Gly Gly Thr Lys Leu Thr Val Leu Gl - #y Leu Glu Ala Pro Ala         #           270                                                               - Ala Ala Pro Ala Asp Pro Ser Lys Asp Ser Ly - #s Ala Gln Val Ser Ala         #       285                                                                   - Ala Glu Ala Gly Ile Thr Gly Thr Trp Tyr As - #n Gln Leu Gly Ser Thr         #   300                                                                       - Phe Ile Val Thr Ala Gly Ala Asp Gly Ala Le - #u Thr Gly Thr Tyr Glu         305                 3 - #10                 3 - #15                 3 -       #20                                                                           - Ser Ala Val Gly Asn Ala Glu Ser Arg Tyr Va - #l Leu Thr Gly Arg Tyr         #               335                                                           - Asp Ser Ala Pro Ala Thr Asp Gly Ser Gly Th - #r Ala Leu Gly Trp Thr         #           350                                                               - Val Ala Trp Lys Asn Asn Tyr Arg Asn Ala Hi - #s Ser Ala Thr Thr Trp         #       365                                                                   - Ser Gly Gln Tyr Val Gly Gly Ala Glu Ala Ar - #g Ile Asn Thr Gln Trp         #   380                                                                       - Leu Leu Thr Ser Gly Thr Thr Glu Ala Asn Al - #a Trp Lys Ser Thr Leu         385                 3 - #90                 3 - #95                 4 -       #00                                                                           - Val Gly His Asp Thr Phe Thr Lys Val Lys Pr - #o Ser Ala Ala Ser Ile         #               415                                                           - Asp Ala Ala Lys Lys Ala Gly Val Asn Asn Gl - #y Asn Pro Leu Asp Ala         #           430                                                               - Val Gln Gln                                                                         435                                                                   - (2) INFORMATION FOR SEQ ID NO: 6:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1296 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   -     (ix) FEATURE:                                                                     (A) NAME/KEY: CDS                                                             (B) LOCATION: 40..1284                                              #6:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATA ATG AAA TAC - # CTA TTG            54                                                                          #       Met Lys Tyr Leu Leu                                                   #      5  1                                                                   - CCT ACG GCA GCC GCT GGA TTG TTA TTA CTC GC - #T GCC CAA CCA GCG ATG          102                                                                          Pro Thr Ala Ala Ala Gly Leu Leu Leu Leu Al - #a Ala Gln Pro Ala Met           #                 20                                                          - GCC CAG GTG CAG CTG CAG CAG CCT GGG GCT GA - #G CTT GTG AAG CCT GGG          150                                                                          Ala Gln Val Gln Leu Gln Gln Pro Gly Ala Gl - #u Leu Val Lys Pro Gly           #             35                                                              - GCT TCA GTG AAG CTG TCC TGC AAG GCT TCT GG - #C TAC ACC TTC ACC AGC          198                                                                          Ala Ser Val Lys Leu Ser Cys Lys Ala Ser Gl - #y Tyr Thr Phe Thr Ser           #         50                                                                  - TAC TGG ATG CAC TGG GTG AAG CAG AGG CCT GG - #A CGA GGC CTT GAG TGG          246                                                                          Tyr Trp Met His Trp Val Lys Gln Arg Pro Gl - #y Arg Gly Leu Glu Trp           #     65                                                                      - ATT GGA AGG ATT GAT CCT AAT AGT GGT GGT AC - #T AAG TAC AAT GAG AAG          294                                                                          Ile Gly Arg Ile Asp Pro Asn Ser Gly Gly Th - #r Lys Tyr Asn Glu Lys           # 85                                                                          - TTC AAG AGC AAG GCC ACA CTG ACT GTA GAC AA - #A CCC TCC AGC ACA GCC          342                                                                          Phe Lys Ser Lys Ala Thr Leu Thr Val Asp Ly - #s Pro Ser Ser Thr Ala           #                100                                                          - TAC ATG CAG CTC AGC AGC CTG ACA TCT GAG GA - #C TCT GCG GTC TAT TAT          390                                                                          Tyr Met Gln Leu Ser Ser Leu Thr Ser Glu As - #p Ser Ala Val Tyr Tyr           #           115                                                               - TGT GCA AGA TAC GAT TAC TAC GGT AGT AGC TA - #C TTT GAC TAC TGG GGC          438                                                                          Cys Ala Arg Tyr Asp Tyr Tyr Gly Ser Ser Ty - #r Phe Asp Tyr Trp Gly           #       130                                                                   - CAA GGG ACC ACG GTC ACC GTC TCC TCA GGT GG - #A GGC GGT TCA GGC GGA          486                                                                          Gln Gly Thr Thr Val Thr Val Ser Ser Gly Gl - #y Gly Gly Ser Gly Gly           #   145                                                                       - GGT GGC TCT GGC GGT GGC GGA TCC CAG GCT GT - #T GTG ACT CAG GAA TCT          534                                                                          Gly Gly Ser Gly Gly Gly Gly Ser Gln Ala Va - #l Val Thr Gln Glu Ser           150                 1 - #55                 1 - #60                 1 -       #65                                                                           - GCA CTC ACC ACA TCA CCT GGT GAA ACA GTC AC - #A CTC ACT TGT CGC TCA          582                                                                          Ala Leu Thr Thr Ser Pro Gly Glu Thr Val Th - #r Leu Thr Cys Arg Ser           #               180                                                           - AGT ACT GGG GCT GTT ACA ACT AGT AAC TAT GC - #C AAC TGG GTC CAA GAA          630                                                                          Ser Thr Gly Ala Val Thr Thr Ser Asn Tyr Al - #a Asn Trp Val Gln Glu           #           195                                                               - AAA CCA GAT CAT TTA TTC ACT GGT CTA ATA GG - #T GGT ACC AAC AAC CGA          678                                                                          Lys Pro Asp His Leu Phe Thr Gly Leu Ile Gl - #y Gly Thr Asn Asn Arg           #       210                                                                   - GCT CCA GGT GTT CCT GCC AGA TTC TCA GGC TC - #C CTG ATT GGA GAC AAG          726                                                                          Ala Pro Gly Val Pro Ala Arg Phe Ser Gly Se - #r Leu Ile Gly Asp Lys           #   225                                                                       - GCT GCC CTC ACC ATC ACA GGG GCA CAG ACT GA - #G GAT GAG GCA ATA TAT          774                                                                          Ala Ala Leu Thr Ile Thr Gly Ala Gln Thr Gl - #u Asp Glu Ala Ile Tyr           230                 2 - #35                 2 - #40                 2 -       #45                                                                           - TTC TGT GCT CTA TGG TAC AGC AAC CAC TGG GT - #G TTC GGT GGA GGA ACC          822                                                                          Phe Cys Ala Leu Trp Tyr Ser Asn His Trp Va - #l Phe Gly Gly Gly Thr           #               260                                                           - AAA CTG ACT GTC CTA GGT CTC GAG GCA CCT GC - #T GCC GCA CCT GCA GAC          870                                                                          Lys Leu Thr Val Leu Gly Leu Glu Ala Pro Al - #a Ala Ala Pro Ala Asp           #           275                                                               - CCG TCC AAG GAC TCC AAA GCT CAG GTT TCT GC - #A GCC GAA GCT GGT ATC          918                                                                          Pro Ser Lys Asp Ser Lys Ala Gln Val Ser Al - #a Ala Glu Ala Gly Ile           #       290                                                                   - ACT GGC ACC TGG TAT AAC CAA CTG GGG TCG AC - #T TTC ATT GTG ACC GCT          966                                                                          Thr Gly Thr Trp Tyr Asn Gln Leu Gly Ser Th - #r Phe Ile Val Thr Ala           #   305                                                                       - GGT GCG GAC GGA GCT CTG ACT GGC ACC TAC GA - #A TCT GCG GTT GGT AAC         1014                                                                          Gly Ala Asp Gly Ala Leu Thr Gly Thr Tyr Gl - #u Ser Ala Val Gly Asn           310                 3 - #15                 3 - #20                 3 -       #25                                                                           - GCA GAA TCC CGC TAC GTA CTG ACT GGC CGT TA - #T GAC TCT GCA CCT GCC         1062                                                                          Ala Glu Ser Arg Tyr Val Leu Thr Gly Arg Ty - #r Asp Ser Ala Pro Ala           #               340                                                           - ACC GAT GGC TCT GGT ACC GCT CTG GGC TGG AC - #T GTG GCT TGG AAA AAC         1110                                                                          Thr Asp Gly Ser Gly Thr Ala Leu Gly Trp Th - #r Val Ala Trp Lys Asn           #           355                                                               - AAC TAT CGT AAT GCG CAC AGC GCC ACT ACG TG - #G TCT GGC CAA TAC GTT         1158                                                                          Asn Tyr Arg Asn Ala His Ser Ala Thr Thr Tr - #p Ser Gly Gln Tyr Val           #       370                                                                   - GGC GGT GCT GAG GCT CGT ATC AAC ACT CAG TG - #G CTG TTA ACA TCC GGC         1206                                                                          Gly Gly Ala Glu Ala Arg Ile Asn Thr Gln Tr - #p Leu Leu Thr Ser Gly           #   385                                                                       - ACT ACC GAA GCG AAT GCA TGG AAA TCG ACA CT - #A GTA GGT CAT GAC ACC         1254                                                                          Thr Thr Glu Ala Asn Ala Trp Lys Ser Thr Le - #u Val Gly His Asp Thr           390                 3 - #95                 4 - #00                 4 -       #05                                                                           - TTT ACC AAA GTT AAG CCT TCT GCT GCT AGC TA - #ATAAGAAT TC                   #1296                                                                         Phe Thr Lys Val Lys Pro Ser Ala Ala Ser                                       #               415                                                           - (2) INFORMATION FOR SEQ ID NO: 7:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 415 amino                                                         (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: protein                                             #7:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Met Lys Tyr Leu Leu Pro Thr Ala Ala Ala Gl - #y Leu Leu Leu Leu Ala         #                 15                                                          - Ala Gln Pro Ala Met Ala Gln Val Gln Leu Gl - #n Gln Pro Gly Ala Glu         #             30                                                              - Leu Val Lys Pro Gly Ala Ser Val Lys Leu Se - #r Cys Lys Ala Ser Gly         #         45                                                                  - Tyr Thr Phe Thr Ser Tyr Trp Met His Trp Va - #l Lys Gln Arg Pro Gly         #     60                                                                      - Arg Gly Leu Glu Trp Ile Gly Arg Ile Asp Pr - #o Asn Ser Gly Gly Thr         # 80                                                                          - Lys Tyr Asn Glu Lys Phe Lys Ser Lys Ala Th - #r Leu Thr Val Asp Lys         #                 95                                                          - Pro Ser Ser Thr Ala Tyr Met Gln Leu Ser Se - #r Leu Thr Ser Glu Asp         #           110                                                               - Ser Ala Val Tyr Tyr Cys Ala Arg Tyr Asp Ty - #r Tyr Gly Ser Ser Tyr         #       125                                                                   - Phe Asp Tyr Trp Gly Gln Gly Thr Thr Val Th - #r Val Ser Ser Gly Gly         #   140                                                                       - Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gl - #y Gly Ser Gln Ala Val         145                 1 - #50                 1 - #55                 1 -       #60                                                                           - Val Thr Gln Glu Ser Ala Leu Thr Thr Ser Pr - #o Gly Glu Thr Val Thr         #               175                                                           - Leu Thr Cys Arg Ser Ser Thr Gly Ala Val Th - #r Thr Ser Asn Tyr Ala         #           190                                                               - Asn Trp Val Gln Glu Lys Pro Asp His Leu Ph - #e Thr Gly Leu Ile Gly         #       205                                                                   - Gly Thr Asn Asn Arg Ala Pro Gly Val Pro Al - #a Arg Phe Ser Gly Ser         #   220                                                                       - Leu Ile Gly Asp Lys Ala Ala Leu Thr Ile Th - #r Gly Ala Gln Thr Glu         225                 2 - #30                 2 - #35                 2 -       #40                                                                           - Asp Glu Ala Ile Tyr Phe Cys Ala Leu Trp Ty - #r Ser Asn His Trp Val         #               255                                                           - Phe Gly Gly Gly Thr Lys Leu Thr Val Leu Gl - #y Leu Glu Ala Pro Ala         #           270                                                               - Ala Ala Pro Ala Asp Pro Ser Lys Asp Ser Ly - #s Ala Gln Val Ser Ala         #       285                                                                   - Ala Glu Ala Gly Ile Thr Gly Thr Trp Tyr As - #n Gln Leu Gly Ser Thr         #   300                                                                       - Phe Ile Val Thr Ala Gly Ala Asp Gly Ala Le - #u Thr Gly Thr Tyr Glu         305                 3 - #10                 3 - #15                 3 -       #20                                                                           - Ser Ala Val Gly Asn Ala Glu Ser Arg Tyr Va - #l Leu Thr Gly Arg Tyr         #               335                                                           - Asp Ser Ala Pro Ala Thr Asp Gly Ser Gly Th - #r Ala Leu Gly Trp Thr         #           350                                                               - Val Ala Trp Lys Asn Asn Tyr Arg Asn Ala Hi - #s Ser Ala Thr Thr Trp         #       365                                                                   - Ser Gly Gln Tyr Val Gly Gly Ala Glu Ala Ar - #g Ile Asn Thr Gln Trp         #   380                                                                       - Leu Leu Thr Ser Gly Thr Thr Glu Ala Asn Al - #a Trp Lys Ser Thr Leu         385                 3 - #90                 3 - #95                 4 -       #00                                                                           - Val Gly His Asp Thr Phe Thr Lys Val Lys Pr - #o Ser Ala Ala Ser             #               415                                                           - (2) INFORMATION FOR SEQ ID NO: 8:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1257 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   -     (ix) FEATURE:                                                                     (A) NAME/KEY: CDS                                                             (B) LOCATION: 40..1245                                              #8:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATA ATG AAA TAC - # CTA TTG            54                                                                          #       Met Lys Tyr Leu Leu                                                   #      5  1                                                                   - CCT ACG GCA GCC GCT GGA TTG TTA TTA CTC GC - #T GCC CAA CCA GCG ATG          102                                                                          Pro Thr Ala Ala Ala Gly Leu Leu Leu Leu Al - #a Ala Gln Pro Ala Met           #                 20                                                          - GCC CAG GTG CAG CTG CAG CAG CCT GGG GCT GA - #G CTT GTG AAG CCT GGG          150                                                                          Ala Gln Val Gln Leu Gln Gln Pro Gly Ala Gl - #u Leu Val Lys Pro Gly           #             35                                                              - GCT TCA GTG AAG CTG TCC TGC AAG GCT TCT GG - #C TAC ACC TTC ACC AGC          198                                                                          Ala Ser Val Lys Leu Ser Cys Lys Ala Ser Gl - #y Tyr Thr Phe Thr Ser           #         50                                                                  - TAC TGG ATG CAC TGG GTG AAG CAG AGG CCT GG - #A CGA GGC CTT GAG TGG          246                                                                          Tyr Trp Met His Trp Val Lys Gln Arg Pro Gl - #y Arg Gly Leu Glu Trp           #     65                                                                      - ATT GGA AGG ATT GAT CCT AAT AGT GGT GGT AC - #T AAG TAC AAT GAG AAG          294                                                                          Ile Gly Arg Ile Asp Pro Asn Ser Gly Gly Th - #r Lys Tyr Asn Glu Lys           # 85                                                                          - TTC AAG AGC AAG GCC ACA CTG ACT GTA GAC AA - #A CCC TCC AGC ACA GCC          342                                                                          Phe Lys Ser Lys Ala Thr Leu Thr Val Asp Ly - #s Pro Ser Ser Thr Ala           #                100                                                          - TAC ATG CAG CTC AGC AGC CTG ACA TCT GAG GA - #C TCT GCG GTC TAT TAT          390                                                                          Tyr Met Gln Leu Ser Ser Leu Thr Ser Glu As - #p Ser Ala Val Tyr Tyr           #           115                                                               - TGT GCA AGA TAC GAT TAC TAC GGT AGT AGC TA - #C TTT GAC TAC TGG GGC          438                                                                          Cys Ala Arg Tyr Asp Tyr Tyr Gly Ser Ser Ty - #r Phe Asp Tyr Trp Gly           #       130                                                                   - CAA GGG ACC ACG GTC ACC GTC TCC TCA GGT GG - #A GGC GGT TCA GGC GGA          486                                                                          Gln Gly Thr Thr Val Thr Val Ser Ser Gly Gl - #y Gly Gly Ser Gly Gly           #   145                                                                       - GGT GGC TCT GGC GGT GGC GGA TCC CAG GCT GT - #T GTG ACT CAG GAA TCT          534                                                                          Gly Gly Ser Gly Gly Gly Gly Ser Gln Ala Va - #l Val Thr Gln Glu Ser           150                 1 - #55                 1 - #60                 1 -       #65                                                                           - GCA CTC ACC ACA TCA CCT GGT GAA ACA GTC AC - #A CTC ACT TGT CGC TCA          582                                                                          Ala Leu Thr Thr Ser Pro Gly Glu Thr Val Th - #r Leu Thr Cys Arg Ser           #               180                                                           - AGT ACT GGG GCT GTT ACA ACT AGT AAC TAT GC - #C AAC TGG GTC CAA GAA          630                                                                          Ser Thr Gly Ala Val Thr Thr Ser Asn Tyr Al - #a Asn Trp Val Gln Glu           #           195                                                               - AAA CCA GAT CAT TTA TTC ACT GGT CTA ATA GG - #T GGT ACC AAC AAC CGA          678                                                                          Lys Pro Asp His Leu Phe Thr Gly Leu Ile Gl - #y Gly Thr Asn Asn Arg           #       210                                                                   - GCT CCA GGT GTT CCT GCC AGA TTC TCA GGC TC - #C CTG ATT GGA GAC AAG          726                                                                          Ala Pro Gly Val Pro Ala Arg Phe Ser Gly Se - #r Leu Ile Gly Asp Lys           #   225                                                                       - GCT GCC CTC ACC ATC ACA GGG GCA CAG ACT GA - #G GAT GAG GCA ATA TAT          774                                                                          Ala Ala Leu Thr Ile Thr Gly Ala Gln Thr Gl - #u Asp Glu Ala Ile Tyr           230                 2 - #35                 2 - #40                 2 -       #45                                                                           - TTC TGT GCT CTA TGG TAC AGC AAC CAC TGG GT - #G TTC GGT GGA GGA ACC          822                                                                          Phe Cys Ala Leu Trp Tyr Ser Asn His Trp Va - #l Phe Gly Gly Gly Thr           #               260                                                           - AAA CTG ACT GTC CTA GGT CTC GAG GCA CCT GC - #T GCC GCA CCT GCC GAA          870                                                                          Lys Leu Thr Val Leu Gly Leu Glu Ala Pro Al - #a Ala Ala Pro Ala Glu           #           275                                                               - GCT GGT ATC ACT GGC ACC TGG TAT AAC CAA CT - #G GGG TCG ACT TTC ATT          918                                                                          Ala Gly Ile Thr Gly Thr Trp Tyr Asn Gln Le - #u Gly Ser Thr Phe Ile           #       290                                                                   - GTG ACC GCT GGT GCG GAC GGA GCT CTG ACT GG - #C ACC TAC GAA TCT GCG          966                                                                          Val Thr Ala Gly Ala Asp Gly Ala Leu Thr Gl - #y Thr Tyr Glu Ser Ala           #   305                                                                       - GTT GGT AAC GCA GAA TCC CGC TAC GTA CTG AC - #T GGC CGT TAT GAC TCT         1014                                                                          Val Gly Asn Ala Glu Ser Arg Tyr Val Leu Th - #r Gly Arg Tyr Asp Ser           310                 3 - #15                 3 - #20                 3 -       #25                                                                           - GCA CCT GCC ACC GAT GGC TCT GGT ACC GCT CT - #G GGC TGG ACT GTG GCT         1062                                                                          Ala Pro Ala Thr Asp Gly Ser Gly Thr Ala Le - #u Gly Trp Thr Val Ala           #               340                                                           - TGG AAA AAC AAC TAT CGT AAT GCG CAC AGC GC - #C ACT ACG TGG TCT GGC         1110                                                                          Trp Lys Asn Asn Tyr Arg Asn Ala His Ser Al - #a Thr Thr Trp Ser Gly           #           355                                                               - CAA TAC GTT GGC GGT GCT GAG GCT CGT ATC AA - #C ACT CAG TGG CTG TTA         1158                                                                          Gln Tyr Val Gly Gly Ala Glu Ala Arg Ile As - #n Thr Gln Trp Leu Leu           #       370                                                                   - ACA TCC GGC ACT ACC GAA GCG AAT GCA TGG AA - #A TCG ACA CTA GTA GGT         1206                                                                          Thr Ser Gly Thr Thr Glu Ala Asn Ala Trp Ly - #s Ser Thr Leu Val Gly           #   385                                                                       - CAT GAC ACC TTT ACC AAA GTT AAG CCT TCT GC - #T GCT AGC TAATAAGAAT          1255                                                                          His Asp Thr Phe Thr Lys Val Lys Pro Ser Al - #a Ala Ser                       390                 3 - #95                 4 - #00                           #            1257                                                             - (2) INFORMATION FOR SEQ ID NO: 9:                                           -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 402 amino                                                         (B) TYPE: amino acid                                                          (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: protein                                             #9:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Met Lys Tyr Leu Leu Pro Thr Ala Ala Ala Gl - #y Leu Leu Leu Leu Ala         #                 15                                                          - Ala Gln Pro Ala Met Ala Gln Val Gln Leu Gl - #n Gln Pro Gly Ala Glu         #             30                                                              - Leu Val Lys Pro Gly Ala Ser Val Lys Leu Se - #r Cys Lys Ala Ser Gly         #         45                                                                  - Tyr Thr Phe Thr Ser Tyr Trp Met His Trp Va - #l Lys Gln Arg Pro Gly         #     60                                                                      - Arg Gly Leu Glu Trp Ile Gly Arg Ile Asp Pr - #o Asn Ser Gly Gly Thr         # 80                                                                          - Lys Tyr Asn Glu Lys Phe Lys Ser Lys Ala Th - #r Leu Thr Val Asp Lys         #                 95                                                          - Pro Ser Ser Thr Ala Tyr Met Gln Leu Ser Se - #r Leu Thr Ser Glu Asp         #           110                                                               - Ser Ala Val Tyr Tyr Cys Ala Arg Tyr Asp Ty - #r Tyr Gly Ser Ser Tyr         #       125                                                                   - Phe Asp Tyr Trp Gly Gln Gly Thr Thr Val Th - #r Val Ser Ser Gly Gly         #   140                                                                       - Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gl - #y Gly Ser Gln Ala Val         145                 1 - #50                 1 - #55                 1 -       #60                                                                           - Val Thr Gln Glu Ser Ala Leu Thr Thr Ser Pr - #o Gly Glu Thr Val Thr         #               175                                                           - Leu Thr Cys Arg Ser Ser Thr Gly Ala Val Th - #r Thr Ser Asn Tyr Ala         #           190                                                               - Asn Trp Val Gln Glu Lys Pro Asp His Leu Ph - #e Thr Gly Leu Ile Gly         #       205                                                                   - Gly Thr Asn Asn Arg Ala Pro Gly Val Pro Al - #a Arg Phe Ser Gly Ser         #   220                                                                       - Leu Ile Gly Asp Lys Ala Ala Leu Thr Ile Th - #r Gly Ala Gln Thr Glu         225                 2 - #30                 2 - #35                 2 -       #40                                                                           - Asp Glu Ala Ile Tyr Phe Cys Ala Leu Trp Ty - #r Ser Asn His Trp Val         #               255                                                           - Phe Gly Gly Gly Thr Lys Leu Thr Val Leu Gl - #y Leu Glu Ala Pro Ala         #           270                                                               - Ala Ala Pro Ala Glu Ala Gly Ile Thr Gly Th - #r Trp Tyr Asn Gln Leu         #       285                                                                   - Gly Ser Thr Phe Ile Val Thr Ala Gly Ala As - #p Gly Ala Leu Thr Gly         #   300                                                                       - Thr Tyr Glu Ser Ala Val Gly Asn Ala Glu Se - #r Arg Tyr Val Leu Thr         305                 3 - #10                 3 - #15                 3 -       #20                                                                           - Gly Arg Tyr Asp Ser Ala Pro Ala Thr Asp Gl - #y Ser Gly Thr Ala Leu         #               335                                                           - Gly Trp Thr Val Ala Trp Lys Asn Asn Tyr Ar - #g Asn Ala His Ser Ala         #           350                                                               - Thr Thr Trp Ser Gly Gln Tyr Val Gly Gly Al - #a Glu Ala Arg Ile Asn         #       365                                                                   - Thr Gln Trp Leu Leu Thr Ser Gly Thr Thr Gl - #u Ala Asn Ala Trp Lys         #   380                                                                       - Ser Thr Leu Val Gly His Asp Thr Phe Thr Ly - #s Val Lys Pro Ser Ala         385                 3 - #90                 3 - #95                 4 -       #00                                                                           - Ala Ser                                                                     - (2) INFORMATION FOR SEQ ID NO: 10:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1259 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #10:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATAA TGAAATACCT AT - #TGCCTACG         60                                                                          - GCAGCCGCTG GATTGTTATT ACTCGCTGCC CAACCAGCGA TGGCCCAGGT GC - #AGCTGCAG        120                                                                          - CAGCCTGGGG CTGAGCTTGT GAAGCCTGGG GCTTCAGTGA AGCTGTCCTG CA - #AGGCTTCT        180                                                                          - GGCTACACCT TCACCAGCTA CTGGATGCAC TGGGTGAAGC AGAGGCCTGG AC - #GAGGCCTT        240                                                                          - GAGTGGATTG GAAGGATTGA TCCTAATAGT GGTGGTACTA AGTACAATGA GA - #AGTTCAAG        300                                                                          - AGCAAGGCCA CACTGACTGT AGACAAACCC TCCAGCACAG CCTACATGCA GC - #TCAGCAGC        360                                                                          - CTGACATCTG AGGACTCTGC GGTCTATTAT TGTGCAAGAT ACGATTACTA CG - #GTAGTAGC        420                                                                          - TACTTTGACT ACTGGGGCCA AGGGACCACG GTCACCGTCT CCTCAGGTGG AG - #GCGGTTCA        480                                                                          - GGCGGAGGTG GCTCTGGCGG TGGCGGATCC CAGGCTGTTG TGACTCAGGA AT - #CTGCACTC        540                                                                          - ACCACATCAC CTGGTGAAAC AGTCACACTC ACTTGTCGCT CAAGTACTGG GG - #CTGTTACA        600                                                                          - ACTAGTAACT ATGCCAACTG GGTCCAAGAA AAACCAGATC ATTTATTCAC TG - #GTCTAATA        660                                                                          - GGTGGTACCA ACAACCGAGC TCCAGGTGTT CCTGCCAGAT TCTCAGGCTC CC - #TGATTGGA        720                                                                          - GACAAGGCTG CCCTCACCAT CACAGGGGCA CAGACTGAGG ATGAGGCAAT AT - #ATTTCTGT        780                                                                          - GCTCTATGGT ACAGCAACCA CTGGGTGTTC GGTGGAGGAA CCAAACTGAC TG - #TCCTAGGT        840                                                                          - CTCGAGATCA AGCGCAAGGA ATCTGCAGCT GCCAAGTTCG AGCGGCAGCA CA - #TGGACTCT        900                                                                          - GGCAACTCCC CCAGCAGCAG CTCCAACTAC TGCAACCTGA TGATGTGCTG CC - #GAAGATGA        960                                                                          - CCCAGGGGAA ATGCAAGCCA GTGAACACCT TTGTGCATGA GTCCCTGGCC GA - #TGTTAAGG       1020                                                                          - CCGTGTGCTC CCAGAAGAAA GTCACTTGCA AGAATGGGCA GACCAACTGC TA - #CCAGAGCA       1080                                                                          - AATCCACCAT GCGCATCACA GACTGCCGCG AGACTGGCAG CTCCAAGTAC CC - #CAACTGCG       1140                                                                          - CCTACAAGAC CACCCAGGTG GAGAAACACA TCATAGTGGC TTGTGGCGGT AA - #ACCGTCCG       1200                                                                          - TGCCAGTCCA CTTCGATGCT TCAGTGTAGA TCTCCACCTG AGGCCAGAAC AG - #TGAATTC        1259                                                                          - (2) INFORMATION FOR SEQ ID NO: 11:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1235 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #11:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATAA TGAAATACCT AT - #TGCCTACG         60                                                                          - GCAGCCGCTG GATTGTTATT ACTCGCTGCC CAACCAGCGA TGGCCCAGGT GC - #AGGAGTCA        120                                                                          - GGACCTGGCC TGGTGGCGCC CTCACAGAGC CTGTCCATCA CATGCACTGT CT - #CAGGGTTC        180                                                                          - TCATTAACCA GTTATGGTGT AAGCTGGGTT CGCCAGCCTC CAAGAAAGGG TC - #TGGAGTGG        240                                                                          - CTGGGAGTAA TATGGGAAGA CGGGAGCACA AATTATCATT CACGTCTCAT AT - #CCAGACTG        300                                                                          - AGCATCAACA AGGATAACTC CAAGAGCCAA GTTTTCTTAA AACTGAACAG TC - #TGCAAACT        360                                                                          - GATGACACAG CCACGTACTA CTGTGCCAAA CCCCACTACG GTAGCAGCAA CG - #TGGGGGCT        420                                                                          - ATGGAATACT GGGGTCAAGG AACCTCGGTC ACCGTCTCCT CAGGTGGAGG CG - #GTTCAGGC        480                                                                          - GGAGGTGGCT CTGGCGGTGG CGGATCGGAC ATCGAGCTCA CCCAGTCTCC AG - #CCTCCCTA        540                                                                          - ACTGCATCTG TGGGAGAAAC TGTCACCATC ACCTGTCGAG CAAGTGAAAA TA - #TTTACAGT        600                                                                          - TATGTAGCAT GGTATCAGCA GAAACAGGGA AAATCTCCTC AGTTCCTGGT CT - #ATAATGCA        660                                                                          - AAATCCTTAG CAGAGGGTGT GCCATCAAGG TTCAGTGGCA GTGGATCAGG CA - #CACAGTTT        720                                                                          - TCTCTGAAGA TCAACAGCCT GCAGCCTGAA AATTTTGGGA ATTATTACTG TC - #AACATCAT        780                                                                          - TATGTTAGTC CGTGGACGTT CGGTGGAGGC ACCAAGCTCG AGATCAAGCG CA - #AGGAATCT        840                                                                          - GCAGCTGCCA AGTTCGAGCG GCAGCACATG GACTCTGGCA ACTCCCCCAG CA - #GCAGCTCC        900                                                                          - AACTACTGCA ACCTGATGAT GTGCTGCCGA AGATGACCCA GGGGAAATGC AA - #GCCAGTGA        960                                                                          - ACACCTTTGT GCATGAGTCC CTGGCCGATG TTAAGGCCGT GTGCTCCCAG AA - #GAAAGTCA       1020                                                                          - CTTGCAAGAA TGGGCAGACC AACTGCTACC AGAGCAAATC CACCATGCGC AT - #CACAGACT       1080                                                                          - GCCGCGAGAC TGGCAGCTCC AAGTACCCCA ACTGCGCCTA CAAGACCACC CA - #GGTGGAGA       1140                                                                          - AACACATCAT AGTGGCTTGT GGCGGTAAAC CGTCCGTGCC AGTCCACTTC GA - #TGCTTCAG       1200                                                                          #     1235         AGGC CAGAACAGTG AATTC                                      - (2) INFORMATION FOR SEQ ID NO: 12:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1226 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #12:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATAA TGAAATACCT AT - #TGCCTACG         60                                                                          - GCAGCCGCTG GATTGTTATT ACTCGCTGCC CAACCAGCGA TGGCCCAGGT GC - #AGCTGCAG        120                                                                          - GAGTCAGGAC CTGGCCTGGT GGCGCCCTCA CAGACGCTGT CCATCACATG CA - #CCGTCTCA        180                                                                          - GGGTTCTCAT TAACCGGCTA TGGTGTAAAC TGGGTTCGCC AGCCTCCAGG AA - #AGGGTCTG        240                                                                          - GAGTGGCTGG GAATGATTTG GGGTGATGGA AACACAGACT ATAATTCAGC TC - #TCAAATCC        300                                                                          - AGACTGAGCA TCAGCAAGGA CAACTCCAAG AGCCAAGTTT TCTTAAAAAT GA - #ACAGTCTG        360                                                                          - CACACTGATG ACACAGCCAG GTACTACTGT GCCAGAGAGA GAGATTATAG GC - #TTGACTAC        420                                                                          - TGGGGCCAAG GCACCACGGT CACCGTCTCC TCAGGTGGAG GCGGTTCAGG CG - #GAGGTGGC        480                                                                          - TCTGGCGGTG GCGGATCGGA CATCGTCATG ACTCAGTCTC CAGCCTCCCT TT - #CTGCGTCT        540                                                                          - GTGGGAGAAA CTGTCACCAT CACATGTCGA GCAAGTGGGA ATATTCACAA TT - #ATTTAGCA        600                                                                          - TGGTATCAGC AGAAACAGGG AAAATCTCCT CAGCTCCTGG TCTATTATAC AA - #CAACCTTA        660                                                                          - GCAGATGGTG TGCCATCAAG GTTCAGTGGC AGTGGATCAG GAACACAATA TT - #CTCTCAAG        720                                                                          - ATCAACAGCC TGCAGCCTGA AGATTTTGGG AGTTATTACT GTCAACATTT TT - #GGAGTACT        780                                                                          - CCTCGGACGT TCGGTGGAGG CACCAAGCTC GAGATCAAGC GCAAGGAATC TG - #CAGCTGCC        840                                                                          - AAGTTCGAGC GGCAGCACAT GGACTCTGGC AACTCCCCCA GCAGCAGCTC CA - #ACTACTGC        900                                                                          - AACCTGATGA TGTGCTGCCG AAGATGACCC AGGGGAAATG CAAGCCAGTG AA - #CACCTTTG        960                                                                          - TGCATGAGTC CCTGGCCGAT GTTAAGGCCG TGTGCTCCCA GAAGAAAGTC AC - #TTGCAAGA       1020                                                                          - ATGGGCAGAC CAACTGCTAC CAGAGCAAAT CCACCATGCG CATCACAGAC TG - #CCGCGAGA       1080                                                                          - CTGGCAGCTC CAAGTACCCC AACTGCGCCT ACAAGACCAC CCAGGTGGAG AA - #ACACATCA       1140                                                                          - TAGTGGCTTG TGGCGGTAAA CCGTCCGTGC CAGTCCACTT CGATGCTTCA GT - #GTAGATCT       1200                                                                          #            1226  CAGT GAATTC                                                - (2) INFORMATION FOR SEQ ID NO: 13:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1648 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #13:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATAA TGAAATACCT AT - #TGCCTACG         60                                                                          - GCAGCCGCTG GATTGTTATT ACTCGCTGCC CAACCAGCGA TGGCCCAGGT GC - #AGCTGCAG        120                                                                          - CAGCCTGGGG CTGAGCTTGT GAAGCCTGGG GCTTCAGTGA AGCTGTCCTG CA - #AGGCTTCT        180                                                                          - GGCTACACCT TCACCAGCTA CTGGATGCAC TGGGTGAAGC AGAGGCCTGG AC - #GAGGCCTT        240                                                                          - GAGTGGATTG GAAGGATTGA TCCTAATAGT GGTGGTACTA AGTACAATGA GA - #AGTTCAAG        300                                                                          - AGCAAGGCCA CACTGACTGT AGACAAACCC TCCAGCACAG CCTACATGCA GC - #TCAGCAGC        360                                                                          - CTGACATCTG AGGACTCTGC GGTCTATTAT TGTGCAAGAT ACGATTACTA CG - #GTAGTAGC        420                                                                          - TACTTTGACT ACTGGGGCCA AGGGACCACG GTCACCGTCT CCTCAGGTGG AG - #GCGGTTCA        480                                                                          - GGCGGAGGTG GCTCTGGCGG TGGCGGATCC CAGGCTGTTG TGACTCAGGA AT - #CTGCACTC        540                                                                          - ACCACATCAC CTGGTGAAAC AGTCACACTC ACTTGTCGCT CAAGTACTGG GG - #CTGTTACA        600                                                                          - ACTAGTAACT ATGCCAACTG GGTCCAAGAA AAACCAGATC ATTTATTCAC TG - #GTCTAATA        660                                                                          - GGTGGTACCA ACAACCGAGC TCCAGGTGTT CCTGCCAGAT TCTCAGGCTC CC - #TGATTGGA        720                                                                          - GACAAGGCTG CCCTCACCAT CACAGGGGCA CAGACTGAGG ATGAGGCAAT AT - #ATTTCTGT        780                                                                          - GCTCTATGGT ACAGCAACCA CTGGGTGTTC GGTGGAGGAA CCAAACTGAC TG - #TCCTAGGT        840                                                                          - CTCGAGATTA AACGTATGCT TAAGATCGCT GCTTTCAACA TACGTACCTT CG - #GTGAATCT        900                                                                          - AAAATGTCTA ACGCTACGCT AGCATCTTAC ATCGTACGCA TCGTACGCCG TT - #ACGATATC        960                                                                          - GTTCTGATCC AGGAAGTTCG CGACTCTCAC CTGGTTGCAG TTGGTAAACT TC - #TAGACTAC       1020                                                                          - CTGAACCAGG ACGACCCGAA CACCTACCAC TACGTTGTTT CTGAACCCCT CG - #GGCGTAAC       1080                                                                          - TCTTACAAAG AACGGTACCT GTTCCTGTTC CGTCCGAACA AAGTTTCAGT AC - #TGGATACC       1140                                                                          - TACCAGTACG ACGACGGATG CGAATCTTGC GGTAACGACT CTTTCTCCCG GG - #AACCGGCT       1200                                                                          - GTTGTTAAAT TCTCGAGCCA CTCTACCAAG GTTAAAGAGT TCGCTATCGT TG - #CTCTGCAC       1260                                                                          - AGCGCGCCGT CTGACGCTGT TGCTGAAATC AACTCTCTGT ACGACGTTTA CC - #TGGACGTT       1320                                                                          - CAGCAGAAAT GGCACCTGAA CGACGTCATG CTGATGGGTG ACTTCAACGC TG - #ACTGCTCT       1380                                                                          - TATGTAACCT CTTCTCAGTG GTCATCGATT CGTCTGCGCA CCTCGTCGAC CT - #TCCAGTGG       1440                                                                          - CTGATCCCGG ACTCCGCTGA CACCACCGCT ACTAGTACCA ACTGCGCTTA CG - #ACCGTATC       1500                                                                          - GTTGTTGCTG GATCCCTGCT GCAGTCTTCT GTTGTACCGG GTAGCGCGGC CC - #CGTTCGAC       1560                                                                          - TTCCAGGCTG CATATGGTCT TTCGAACGAA ATGGCGCTGG CCATCTCTGA TC - #ACTACCCG       1620                                                                          #           1648   CCTA ATTCTAGA                                              - (2) INFORMATION FOR SEQ ID NO: 14:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1624 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #14:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATAA TGAAATACCT AT - #TGCCTACG         60                                                                          - GCAGCCGCTG GATTGTTATT ACTCGCTGCC CAACCAGCGA TGGCCCAGGT GC - #AGGAGTCA        120                                                                          - GGACCTGGCC TGGTGGCGCC CTCACAGAGC CTGTCCATCA CATGCACTGT CT - #CAGGGTTC        180                                                                          - TCATTAACCA GTTATGGTGT AAGCTGGGTT CGCCAGCCTC CAAGAAAGGG TC - #TGGAGTGG        240                                                                          - CTGGGAGTAA TATGGGAAGA CGGGAGCACA AATTATCATT CACGTCTCAT AT - #CCAGACTG        300                                                                          - AGCATCAACA AGGATAACTC CAAGAGCCAA GTTTTCTTAA AACTGAACAG TC - #TGCAAACT        360                                                                          - GATGACACAG CCACGTACTA CTGTGCCAAA CCCCACTACG GTAGCAGCAA CG - #TGGGGGCT        420                                                                          - ATGGAATACT GGGGTCAAGG AACCTCGGTC ACCGTCTCCT CAGGTGGAGG CG - #GTTCAGGC        480                                                                          - GGAGGTGGCT CTGGCGGTGG CGGATCGGAC ATCGAGCTCA CCCAGTCTCC AG - #CCTCCCTA        540                                                                          - ACTGCATCTG TGGGAGAAAC TGTCACCATC ACCTGTCGAG CAAGTGAAAA TA - #TTTACAGT        600                                                                          - TATGTAGCAT GGTATCAGCA GAAACAGGGA AAATCTCCTC AGTTCCTGGT CT - #ATAATGCA        660                                                                          - AAATCCTTAG CAGAGGGTGT GCCATCAAGG TTCAGTGGCA GTGGATCAGG CA - #CACAGTTT        720                                                                          - TCTCTGAAGA TCAACAGCCT GCAGCCTGAA AATTTTGGGA ATTATTACTG TC - #AACATCAT        780                                                                          - TATGTTAGTC CGTGGACGTT CGGTGGAGGC ACCAAGCTCG AGATTAAACG TA - #TGCTTAAG        840                                                                          - ATCGCTGCTT TCAACATACG TACCTTCGGT GAATCTAAAA TGTCTAACGC TA - #CGCTAGCA        900                                                                          - TCTTACATCG TACGCATCGT ACGCCGTTAC GATATCGTTC TGATCCAGGA AG - #TTCGCGAC        960                                                                          - TCTCACCTGG TTGCAGTTGG TAAACTTCTA GACTACCTGA ACCAGGACGA CC - #CGAACACC       1020                                                                          - TACCACTACG TTGTTTCTGA ACCCCTCGGG CGTAACTCTT ACAAAGAACG GT - #ACCTGTTC       1080                                                                          - CTGTTCCGTC CGAACAAAGT TTCAGTACTG GATACCTACC AGTACGACGA CG - #GATGCGAA       1140                                                                          - TCTTGCGGTA ACGACTCTTT CTCCCGGGAA CCGGCTGTTG TTAAATTCTC GA - #GCCACTCT       1200                                                                          - ACCAAGGTTA AAGAGTTCGC TATCGTTGCT CTGCACAGCG CGCCGTCTGA CG - #CTGTTGCT       1260                                                                          - GAAATCAACT CTCTGTACGA CGTTTACCTG GACGTTCAGC AGAAATGGCA CC - #TGAACGAC       1320                                                                          - GTCATGCTGA TGGGTGACTT CAACGCTGAC TGCTCTTATG TAACCTCTTC TC - #AGTGGTCA       1380                                                                          - TCGATTCGTC TGCGCACCTC GTCGACCTTC CAGTGGCTGA TCCCGGACTC CG - #CTGACACC       1440                                                                          - ACCGCTACTA GTACCAACTG CGCTTACGAC CGTATCGTTG TTGCTGGATC CC - #TGCTGCAG       1500                                                                          - TCTTCTGTTG TACCGGGTAG CGCGGCCCCG TTCGACTTCC AGGCTGCATA TG - #GTCTTTCG       1560                                                                          - AACGAAATGG CGCTGGCCAT CTCTGATCAC TACCCGGTTG AGGTAACCCT GA - #CCTAATTC       1620                                                                          #           1624                                                              - (2) INFORMATION FOR SEQ ID NO: 15:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1615 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #15:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - AAGCTTGCAT GCAAATTCTA TTTCAAGGAG ACAGTCATAA TGAAATACCT AT - #TGCCTACG         60                                                                          - GCAGCCGCTG GATTGTTATT ACTCGCTGCC CAACCAGCGA TGGCCCAGGT GC - #AGCTGCAG        120                                                                          - GAGTCAGGAC CTGGCCTGGT GGCGCCCTCA CAGACGCTGT CCATCACATG CA - #CCGTCTCA        180                                                                          - GGGTTCTCAT TAACCGGCTA TGGTGTAAAC TGGGTTCGCC AGCCTCCAGG AA - #AGGGTCTG        240                                                                          - GAGTGGCTGG GAATGATTTG GGGTGATGGA AACACAGACT ATAATTCAGC TC - #TCAAATCC        300                                                                          - AGACTGAGCA TCAGCAAGGA CAACTCCAAG AGCCAAGTTT TCTTAAAAAT GA - #ACAGTCTG        360                                                                          - CACACTGATG ACACAGCCAG GTACTACTGT GCCAGAGAGA GAGATTATAG GC - #TTGACTAC        420                                                                          - TGGGGCCAAG GCACCACGGT CACCGTCTCC TCAGGTGGAG GCGGTTCAGG CG - #GAGGTGGC        480                                                                          - TCTGGCGGTG GCGGATCGGA CATCGTCATG ACTCAGTCTC CAGCCTCCCT TT - #CTGCGTCT        540                                                                          - GTGGGAGAAA CTGTCACCAT CACATGTCGA GCAAGTGGGA ATATTCACAA TT - #ATTTAGCA        600                                                                          - TGGTATCAGC AGAAACAGGG AAAATCTCCT CAGCTCCTGG TCTATTATAC AA - #CAACCTTA        660                                                                          - GCAGATGGTG TGCCATCAAG GTTCAGTGGC AGTGGATCAG GAACACAATA TT - #CTCTCAAG        720                                                                          - ATCAACAGCC TGCAGCCTGA AGATTTTGGG AGTTATTACT GTCAACATTT TT - #GGAGTACT        780                                                                          - CCTCGGACGT TCGGTGGAGG CACCAAGCTC GAGATTAAAC GTATGCTTAA GA - #TCGCTGCT        840                                                                          - TTCAACATAC GTACCTTCGG TGAATCTAAA ATGTCTAACG CTACGCTAGC AT - #CTTACATC        900                                                                          - GTACGCATCG TACGCCGTTA CGATATCGTT CTGATCCAGG AAGTTCGCGA CT - #CTCACCTG        960                                                                          - GTTGCAGTTG GTAAACTTCT AGACTACCTG AACCAGGACG ACCCGAACAC CT - #ACCACTAC       1020                                                                          - GTTGTTTCTG AACCCCTCGG GCGTAACTCT TACAAAGAAC GGTACCTGTT CC - #TGTTCCGT       1080                                                                          - CCGAACAAAG TTTCAGTACT GGATACCTAC CAGTACGACG ACGGATGCGA AT - #CTTGCGGT       1140                                                                          - AACGACTCTT TCTCCCGGGA ACCGGCTGTT GTTAAATTCT CGAGCCACTC TA - #CCAAGGTT       1200                                                                          - AAAGAGTTCG CTATCGTTGC TCTGCACAGC GCGCCGTCTG ACGCTGTTGC TG - #AAATCAAC       1260                                                                          - TCTCTGTACG ACGTTTACCT GGACGTTCAG CAGAAATGGC ACCTGAACGA CG - #TCATGCTG       1320                                                                          - ATGGGTGACT TCAACGCTGA CTGCTCTTAT GTAACCTCTT CTCAGTGGTC AT - #CGATTCGT       1380                                                                          - CTGCGCACCT CGTCGACCTT CCAGTGGCTG ATCCCGGACT CCGCTGACAC CA - #CCGCTACT       1440                                                                          - AGTACCAACT GCGCTTACGA CCGTATCGTT GTTGCTGGAT CCCTGCTGCA GT - #CTTCTGTT       1500                                                                          - GTACCGGGTA GCGCGGCCCC GTTCGACTTC CAGGCTGCAT ATGGTCTTTC GA - #ACGAAATG       1560                                                                          - GCGCTGGCCA TCTCTGATCA CTACCCGGTT GAGGTAACCC TGACCTAATT CT - #AGA            1615                                                                          - (2) INFORMATION FOR SEQ ID NO: 16:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 4 amino                                                           (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: peptide                                             -    (iii) HYPOTHETICAL: NO                                                   #16:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Gly Gly Gly Ser                                                             - (2) INFORMATION FOR SEQ ID NO: 17:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 26 base                                                           (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #17:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   #              26  AGTC AGGACC                                                - (2) INFORMATION FOR SEQ ID NO: 18:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 30 base                                                           (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #18:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   #           30     TCAC TCAGTCTCCA                                            - (2) INFORMATION FOR SEQ ID NO: 19:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 36 base                                                           (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #19:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   #       36         CTGT TGTGACTCAG GAATCT                                     - (2) INFORMATION FOR SEQ ID NO: 20:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 42 base                                                           (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #20:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   #  42              AGGT CTCGAGTAAT AAGAATTCAT GC                              - (2) INFORMATION FOR SEQ ID NO: 21:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 23 base                                                           (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #21:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   #                23AGCC TGG                                                   - (2) INFORMATION FOR SEQ ID NO: 22:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 32 base                                                           (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #22:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   #          32      GGTC ACCGTCTCCT CA                                         - (2) INFORMATION FOR SEQ ID NO: 23:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 7 amino                                                           (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: peptide                                             -    (iii) HYPOTHETICAL: NO                                                   #23:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Ala Pro Ala Ala Ala Pro Ala                                                 1               5                                                             - (2) INFORMATION FOR SEQ ID NO: 24:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1259 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #24:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - ATGAAATACC TATTGCCTAC GGCAGCCGCT GGATTGTTAT TACTCGCTGC CC - #AACCAGCG         60                                                                          - ATGGCCCAGC TGCAGGAGTC AGGACCTGGC CTGGTGGCGC CCTCACAGAG CC - #TGTCCATC        120                                                                          - ACATGCACTG TCTCAGGGTT CTCATTAACC AGTTATGGTG TAAGCTGGGT TC - #GCCAGCCT        180                                                                          - CCAAGAAAGG GTCTGGAGTG GCTGGGAGTA ATATGGGAAG ACGGGAGCAC AA - #ATTATCAT        240                                                                          - TCACGTCTCA TATCCAGACT GAGCATCAAC AAGGATAACT CCAAGAGCCA AG - #TTTTCTTA        300                                                                          - AAACTGAACA GTCTGCAAAC TGATGACACA GCCACGTACT ACTGTGCCAA AC - #CCCACTAC        360                                                                          - GGTAGCAGCA ACGTGGGGGC TATGGAATAC TGGGGTCAAG GAACCTCGGT CA - #CCGTCTCC        420                                                                          - TCAGGTGGAG GCGGTTCAGG CGGAGGTGGC TCTGGCGGTG GCGGATCGGA CA - #TCGAGCTC        480                                                                          - ACCCAGTCTC CAGCCTCCCT AACTGCATCT GTGGGAGAAA CTGTCACCAT CA - #CCTGTCGA        540                                                                          - GCAAGTGAAA ATATTTACAG TTATGTAGCA TGGTATCAGC AGAAACAGGG AA - #AATCTCCT        600                                                                          - CAGTTCCTGG TCTATAATGC AAAATCCTTA GCAGAGGGTG TGCCATCAAG GT - #TCAGTGGC        660                                                                          - AGTGGATCAG GCACACAGTT TTCTCTGAAG ATCAACAGCC TGCAGCCTGA AG - #ATTTTGGG        720                                                                          - AATTATTACT GTCAACATCA TTATGTTAGT CCGTGGACGT TCGGTGGAGG CA - #CCAAGCTC        780                                                                          - GAGATCAAGC GCTCTAGCCT CGAAGGTGGG TGCGCTGGTA ATAGAGTCAG AA - #GATCAGTC        840                                                                          - GGAAGCAGCC TGTCTTGCGG TGGTCTCGAC GTCGAGATCA AGCGCAAGGA AT - #CTGCAGCT        900                                                                          - GCCAAGTTCG AGCGGCAGCA CATGGACTCT GGCAACTCCC CCAGCAGCAG CT - #CCAACTAC        960                                                                          - TGCAACCTGA TGATGTGCTG CCGAAGATGA CCCAGGGGAA ATGCAAGCCA GT - #GAACACCT       1020                                                                          - TTGTGCATGA GTCCCTGGCC GATGTTAAGG CCGTGTGCTC CCAGAAGAAA GT - #CACTTGCA       1080                                                                          - AGAATGGGCA GACCAACTGC TACCAGAGCA AATCCACCAT GCGCATCACA GA - #CTGCCGCG       1140                                                                          - AGACTGGCAG CTCCAAGTAC CCCAACTGCG CCTACAAGAC CACCCAGGTG GA - #GAAACACA       1200                                                                          - TCATAGTGGC TTGTGGCGGT AAACCGTCCG TGCCAGTCCA CTTCGATGCT TC - #AGTGTAG        1259                                                                          - (2) INFORMATION FOR SEQ ID NO: 25:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1178 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #25:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - ATGAAATACC TATTGCCTAC GGCAGCCGCT GGATTGTTAT TACTCGCTGC CC - #AACCAGCG         60                                                                          - ATGGCCCAGC TGCAGGAGTC AGGACCTGGC CTGGTGGCGC CCTCACAGAG CC - #TGTCCATC        120                                                                          - ACATGCACTG TCTCAGGGTT CTCATTAACC AGTTATGGTG TAAGCTGGGT TC - #GCCAGCCT        180                                                                          - CCAAGAAAGG GTCTGGAGTG GCTGGGAGTA ATATGGGAAG ACGGGAGCAC AA - #ATTATCAT        240                                                                          - TCACGTCTCA TATCCAGACT GAGCATCAAC AAGGATAACT CCAAGAGCCA AG - #TTTTCTTA        300                                                                          - AAACTGAACA GTCTGCAAAC TGATGACACA GCCACGTACT ACTGTGCCAA AC - #CCCACTAC        360                                                                          - GGTAGCAGCA ACGTGGGGGC TATGGAATAC TGGGGTCAAG GAACCTCGGT CA - #CCGTCTCC        420                                                                          - TCAGGTGGAG GCGGTTCAGG CGGAGGTGGC TCTGGCGGTG GCGGATCGGA CA - #TCGAGCTC        480                                                                          - ACCCAGTCTC CAGCCTCCCT AACTGCATCT GTGGGAGAAA CTGTCACCAT CA - #CCTGTCGA        540                                                                          - GCAAGTGAAA ATATTTACAG TTATGTAGCA TGGTATCAGC AGAAACAGGG AA - #AATCTCCT        600                                                                          - CAGTTCCTGG TCTATAATGC AAAATCCTTA GCAGAGGGTG TGCCATCAAG GT - #TCAGTGGC        660                                                                          - AGTGGATCAG GCACACAGTT TTCTCTGAAG ATCAACAGCC TGCAGCCTGA AG - #ATTTTGGG        720                                                                          - AATTATTACT GTCAACATCA TTATGTTAGT CCGTGGACGT TCGGTGGAGG CA - #CCAAGCTC        780                                                                          - GAGATCAAGC GCAAGGAATC TGCAGCTGCC AAGTTCGAGC GGCAGCACAT GG - #ACTCTGGC        840                                                                          - AACTCCCCCA GCAGCAGCTC CAACTACTGC AACCTGATGA TGTGCTGCCG AA - #GATGACCC        900                                                                          - AGGGGAAATG CAAGCCAGTG AACACCTTTG TGCATGAGTC CCTGGCCGAT GT - #TAAGGCCG        960                                                                          - TGTGCTCCCA GAAGAAAGTC ACTTGCAAGA ATGGGCAGAC CAACTGCTAC CA - #GAGCAAAT       1020                                                                          - CCACCATGCG CATCACAGAC TGCCGCGAGA CTGGCAGCTC CAAGTACCCC AA - #CTGCGCCT       1080                                                                          - ACAAGACCAC CCAGGTGGAG AAACACATCA TAGTGGCTTG TGGCGGTAAA CC - #GTCCGTGC       1140                                                                          #   1178           TTCA GTGAAGGACG AACTGTAA                                   - (2) INFORMATION FOR SEQ ID NO: 26:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1295 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #26:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - ATGAAATACC TATTGCCTAC GGCAGCCGCT GGATTGTTAT TACTCGCTGC CC - #AACCAGCG         60                                                                          - ATGGCCCAGC TGCAGGAGTC AGGACCTGGC CTGGTGGCGC CCTCACAGAG CC - #TGTCCATC        120                                                                          - ACATGCACTG TCTCAGGGTT CTCATTAACC AGTTATGGTG TAAGCTGGGT TC - #GCCAGCCT        180                                                                          - CCAAGAAAGG GTCTGGAGTG GCTGGGAGTA ATATGGGAAG ACGGGAGCAC AA - #ATTATCAT        240                                                                          - TCACGTCTCA TATCCAGACT GAGCATCAAC AAGGATAACT CCAAGAGCCA AG - #TTTTCTTA        300                                                                          - AAACTGAACA GTCTGCAAAC TGATGACACA GCCACGTACT ACTGTGCCAA AC - #CCCACTAC        360                                                                          - GGTAGCAGCA ACGTGGGGGC TATGGAATAC TGGGGTCAAG GAACCTCGGT CA - #CCGTCTCC        420                                                                          - TCAGGTGGAG GCGGTTCAGG CGGAGGTGGC TCTGGCGGTG GCGGATCGGA CA - #TCGAGCTC        480                                                                          - ACCCAGTCTC CAGCCTCCCT AACTGCATCT GTGGGAGAAA CTGTCACCAT CA - #CCTGTCGA        540                                                                          - GCAAGTGAAA ATATTTACAG TTATGTAGCA TGGTATCAGC AGAAACAGGG AA - #AATCTCCT        600                                                                          - CAGTTCCTGG TCTATAATGC AAAATCCTTA GCAGAGGGTG TGCCATCAAG GT - #TCAGTGGC        660                                                                          - AGTGGATCAG GCACACAGTT TTCTCTGAAG ATCAACAGCC TGCAGCCTGA AG - #ATTTTGGG        720                                                                          - AATTATTACT GTCAACATCA TTATGTTAGT CCGTGGACGT TCGGTGGAGG CA - #CCAAGCTC        780                                                                          - GAGATCAAGC GCTCTAGCCT CGAAGGTGGG TGCGCTGGTA ATAGAGTCAG AA - #GATCAGTC        840                                                                          - GGAAGCAGCC TGTCTTGCGG TGGTCTCGAC GTCGAGATCA AGGCACCTGC TG - #CCTCCCCG        900                                                                          - GCAGACGCTA AGGAATCTGC AGCTGCCAAG TTCGAGCGGC AGCACATGGA CT - #CTGGCAAC        960                                                                          - TCCCCCAGCA GCAGCTCCAA CTACTGCAAC CTGATGATGT GCTGCCGAAG AT - #GACCCAGG       1020                                                                          - GGAAATGCAA GCCAGTGAAC ACCTTTGTGC ATGAGTCCCT GGCCGATGTT AA - #GGCCGTGT       1080                                                                          - GCTCCCAGAA GAAAGTCACT TGCAAGAATG GGCAGACCAA CTGCTACCAG AG - #CAAATCCA       1140                                                                          - CCATGCGCAT CACAGACTGC CGCGAGACTG GCAGCTCCAA GTACCCCAAC TG - #CGCCTACA       1200                                                                          - AGACCACCCA GGTGGAGAAA CACATCATAG TGGCTTGTGG CGGTAAACCG TC - #CGTGCCAG       1260                                                                          #     1295         AGTG AAGGACGAAC TGTAA                                      - (2) INFORMATION FOR SEQ ID NO: 27:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1202 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #27:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - ATGAAATACC TATTGCCTAC GGCAGCCGCT GGATTGTTAT TACTCGCTGC CC - #AACCAGCG         60                                                                          - ATGGCCCAGC TGCAGGAGTC AGGACCTGGC CTGGTGGCGC CCTCACAGAG CC - #TGTCCATC        120                                                                          - ACATGCACTG TCTCAGGGTT CTCATTAACC AGTTATGGTG TAAGCTGGGT TC - #GCCAGCCT        180                                                                          - CCAAGAAAGG GTCTGGAGTG GCTGGGAGTA ATATGGGAAG ACGGGAGCAC AA - #ATTATCAT        240                                                                          - TCACGTCTCA TATCCAGACT GAGCATCAAC AAGGATAACT CCAAGAGCCA AG - #TTTTCTTA        300                                                                          - AAACTGAACA GTCTGCAAAC TGATGACACA GCCACGTACT ACTGTGCCAA AC - #CCCACTAC        360                                                                          - GGTAGCAGCA ACGTGGGGGC TATGGAATAC TGGGGTCAAG GAACCTCGGT CA - #CCGTCTCC        420                                                                          - TCAGGTGGAG GCGGTTCAGG CGGAGGTGGC TCTGGCGGTG GCGGATCGGA CA - #TCGAGCTC        480                                                                          - ACCCAGTCTC CAGCCTCCCT AACTGCATCT GTGGGAGAAA CTGTCACCAT CA - #CCTGTCGA        540                                                                          - GCAAGTGAAA ATATTTACAG TTATGTAGCA TGGTATCAGC AGAAACAGGG AA - #AATCTCCT        600                                                                          - CAGTTCCTGG TCTATAATGC AAAATCCTTA GCAGAGGGTG TGCCATCAAG GT - #TCAGTGGC        660                                                                          - AGTGGATCAG GCACACAGTT TTCTCTGAAG ATCAACAGCC TGCAGCCTGA AG - #ATTTTGGG        720                                                                          - AATTATTACT GTCAACATCA TTATGTTAGT CCGTGGACGT TCGGTGGAGG CA - #CCAAGCTC        780                                                                          - GAGATCAAGG CACCTGCTGC CTCCCCGGCA GACGCTAAGG AATCTGCAGC TG - #CCAAGTTC        840                                                                          - GAGCGGCAGC ACATGGACTC TGGCAACTCC CCCAGCAGCA GCTCCAACTA CT - #GCAACCTG        900                                                                          - ATGATGTGCT GCCGAAGATG ACCCAGGGGA AATGCAAGCC AGTGAACACC TT - #TGTGCATG        960                                                                          - AGTCCCTGGC CGATGTTAAG GCCGTGTGCT CCCAGAAGAA AGTCACTTGC AA - #GAATGGGC       1020                                                                          - AGACCAACTG CTACCAGAGC AAATCCACCA TGCGCATCAC AGACTGCCGC GA - #GACTGGCA       1080                                                                          - GCTCCAAGTA CCCCAACTGC GCCTACAAGA CCACCCAGGT GGAGAAACAC AT - #CATAGTGG       1140                                                                          - CTTGTGGCGG TAAACCGTCC GTGCCAGTCC ACTTCGATGC TTCAGTGAAG GA - #CGAACTGT       1200                                                                          #            1202                                                             - (2) INFORMATION FOR SEQ ID NO: 28:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #pairs    (A) LENGTH: 1178 base                                                         (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: cDNA                                                -    (iii) HYPOTHETICAL: NO                                                   #28:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - ATGAAATACC TATTGCCTAC GGCAGCCGCT GGATTGTTAT TACTCGCTGC CC - #AACCAGCG         60                                                                          - ATGGCCCAGC TGCAGGAGTC AGGACCTGGC CTGGTGGCGC CCTCACAGAG CC - #TGTCCATC        120                                                                          - ACATGCACTG TCTCAGGGTT CTCATTAACC AGTTATGGTG TAAGCTGGGT TC - #GCCAGCCT        180                                                                          - CCAAGAAAGG GTCTGGAGTG GCTGGGAGTA ATATGGGAAG ACGGGAGCAC AA - #ATTATCAT        240                                                                          - TCACGTCTCA TATCCAGACT GAGCATCAAC AAGGATAACT CCAAGAGCCA AG - #TTTTCTTA        300                                                                          - AAACTGAACA GTCTGCAAAC TGATGACACA GCCACGTACT ACTGTGCCAA AC - #CCCACTAC        360                                                                          - GGTAGCAGCA ACGTGGGGGC TATGGAATAC TGGGGTCAAG GAACCTCGGT CA - #CCGTCTCC        420                                                                          - TCAGGTGGAG GCGGTTCAGG CGGAGGTGGC TCTGGCGGTG GCGGATCGGA CA - #TCGAGCTC        480                                                                          - ACCCAGTCTC CAGCCTCCCT AACTGCATCT GTGGGAGAAA CTGTCACCAT CA - #CCTGTCGA        540                                                                          - GCAAGTGAAA ATATTTACAG TTATGTAGCA TGGTATCAGC AGAAACAGGG AA - #GATCTCCT        600                                                                          - CAGTTCCTGG TCTATAATGC AAAATCCTTA GCAGAGGGTG TGCCATCAAG GT - #TCAGTGGC        660                                                                          - AGTGGATCAG GCACACAGTT TTCTCTGAAG ATCAACAGCC TGCAGCCTGA AA - #ATTTTGGG        720                                                                          - AATTATTACT GTCAACATCA TTATGTTAGT CCGTGGACGT TCGGTGGAGG CA - #CCAAGCTC        780                                                                          - GAGATCAAGC GCAAGGAATC TGCAGCTGCC AAGTTCGAGC GGCAGCACAT GG - #ACTCTGGC        840                                                                          - AACTCCCCCA GCAGCAGCTC CAACTACTGC AACCTGATGA TGTGCTGCCG AA - #GATGACCC        900                                                                          - AGGGGAAATG CAAGCCAGTG AACACCTTTG TGCATGAGTC CCTGGCCGAT GT - #TAAGGCCG        960                                                                          - TGTGCTCCCA GAAGAAAGTC ACTTGCAAGA ATGGGCAGAC CAACTGCTAC CA - #GAGCAAAT       1020                                                                          - CCACCATGCG CATCACAGAC TGCCGCGAGA CTGGCAGCTC CAAGTACCCC AA - #CTGCGCCT       1080                                                                          - ACAAGACCAC CCAGGTGGAG AAACACATCA TAGTGGCTTG TGGCGGTAAA CC - #GTCCGTGC       1140                                                                          #   1178           TTCA GTGAAGGACG AACTGTAA                                   - (2) INFORMATION FOR SEQ ID NO: 29:                                          -      (i) SEQUENCE CHARACTERISTICS:                                          #acids    (A) LENGTH: 4 amino                                                           (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                -     (ii) MOLECULE TYPE: peptide                                             -    (iii) HYPOTHETICAL: NO                                                   #29:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:                                   - Lys Asp Glu Leu                                                             __________________________________________________________________________

We claim:
 1. A compound comprising a target cell-specific portion and a cytotoxic portion characterised in that the cytotoxic portion has DNA endonucleolytic activity.
 2. A compound according to claim 1 wherein the cytotoxic portion is at least the catalytically active portion of a DNA endonuclease.
 3. A compound according to claim 2 wherein the endonuclease is a mammalian deoxyribonuclease I.
 4. A compound according to claim 3 wherein a nuclear localization signal is incorporated.
 5. A compound according to claim 4 wherein the nuclear localization signal comprises the sequence PKKKRKV: SEQ ID NO
 1. 6. A compound according to claim 6 wherein the DNA endonuclease is a restriction endonuclease.
 7. A compound according to claim 1, wherein the target cell-specific portion comprises a ScFv.
 8. A compound according to claim 1, wherein the target cell-specific portion and the cytotoxic portion are fused.
 9. A compound according to claim 1, wherein the target cell-specific portion bind selectively to a tumour cell.
 10. A composition comprising the compound of claim 1 and a pharmaceutical carrier. 